Re: RAID1 fails to recover chunk tree
On 10/30/2014 06:30 AM, Zack Coffey wrote: Rob, That second drive was immediately put to use elsewhere. I figured having only the metadata on that drive, it wouldn't matter. The data stayed single and wasn't part of the second drive, only the metadata was. I must not be capable of understanding why that wouldn't work. I thought all I was doing was removing a duplication of metadata and the worst I would see is a message complaining about a drive missing. Never thought the data or access to it could be compromised in what seemed to be a simple situation. Anand, I get the same output with mount -o recovery,ro. Your data is gone if your other drive is gone. Single doesn't mean what you think it means. Single means one single copy of your data, but it has _nothing_ to do with one single drive. That would mean that after a btrfs device add the default would be to never, ever, use that added drive. So RAID0 means striped, so there are chunks, then chunk=0 is on drive=0 at offset zero. Chunk=1 is on drive=1 at offset zero. (where there are N drives.) Chunk=N is on drive=N at offset zero. Chunk=N+1 is on drive=0 at offset Chunk_Size+1. And so on. Concatenation is that drive=N follows drive=N-1 at offset sum(sizeofeach(all drives less than N)). So Byte=0 is on drive=0 at offset0; and Byte=(sizeof drive0) is on drive=1 at byte=0. The RAID standard never addressed bulk concatenation, so there is no raid-number for the one whole drive after another. BTRFS uses single, others use other words. So if you had a 100G drive, and you added a second 100G drive, you'd have a logically 200G drive, where the first 100G is on drive one, and the second is on drive two. You've basically obliterated the second half of the filesystem storage when you physically removed the drive without semantically removing it first. Might as well have erased it with a magnet, and all the data with it. Worse still, if you did any sort of balance or defrag you likely moved huge numbers of the _single_ copy of your data clusters onto that other device. So the layout option isn't about limiting storage, that wouldn't make sense, that's what device add/delete is about. Its about how the data is laid out across all the drives. All those unreachable addresses are on that now-defunct drive. No mount option will ever get you that data back. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v4] btrfs-progs: optimize btrfs_scan_lblkid() for multiple calls
On Fri, Oct 31, 2014 at 12:11:20PM +0800, Anand Jain wrote: btrfs_scan_lblikd() is called by most the device related command functions. And btrfs_scan_lblkid() is most expensive function and it becomes more expensive as number of devices in the system increase. Further some threads call this wouldn't be possible to ask udev rather than scan all devices? I understand than in some cases it's necessary to have robust and independent solution, but for usual use-cases it would be less expensive to read the info from udev where we already keep track about all block devices and where we call libblkid. It would be possible to implement it as optional feature (#ifdev HAVE_LIBUDEV), the library API is very easy to use. (For example lsblk uses libblkid as fallback, the default is udev). Karel -- Karel Zak k...@redhat.com http://karelzak.blogspot.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
request for info on the list of parameters to tweak for PCIe SSDs
Hi, Could you kindly help us with the list of all the xfs file system parameters , that can be tweaked for the best performance of the PCIe based SSDs ? Thanks, Lakshmi
Re: v3.18-rc2 at a 32 bit KVM gives :INFO: trying to register non-static key.the code is fine but needs lockdep annotation.
On 10/31/2014 02:36 AM, Liu Bo wrote: On Fri, Oct 31, 2014 at 12:33:43AM +0100, Toralf Förster wrote: On 10/30/2014 11:15 AM, Liu Bo wrote: On Wed, Oct 29, 2014 at 05:56:33PM +0100, Toralf Förster wrote: This is new in my eyes, or ? : Also new to me, could you please turn on lock debug and try again? thanks, -liubo you mean CONFIG_PROVE_LOCKING=y right ? I'd have these open, CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_DEBUG_LOCKDEP=y __btrfs_alloc_chunk() uses write_[un]lock and spin_[un]lock, it may tell us something. thanks, -liubo I activated those config options and got immediately after starting the trinity fuzzer : Oct 31 11:33:31 n22kvmclone kernel: INFO: trying to register non-static key. Oct 31 11:33:31 n22kvmclone kernel: the code is fine but needs lockdep annotation. Oct 31 11:33:31 n22kvmclone kernel: turning off the locking correctness validator. Oct 31 11:33:31 n22kvmclone kernel: CPU: 1 PID: 2470 Comm: trinity-c1 Not tainted 3.18.0-rc2 #3 Oct 31 11:33:31 n22kvmclone kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Oct 31 11:33:31 n22kvmclone kernel: f4209b38 d4e689de d5893078 f4209bc0 d4a89238 d4fb1b3c Oct 31 11:33:31 n22kvmclone kernel: f4fc8e14 f4209b78 0282 f4fc8df4 f4209b84 0282 Oct 31 11:33:31 n22kvmclone kernel: 0282 f8903a16 f4fc8e04 f4fc8e04 f4209b84 d4e702c2 d5893078 Oct 31 11:33:31 n22kvmclone kernel: Call Trace: Oct 31 11:33:31 n22kvmclone kernel: [d4e689de] dump_stack+0x41/0x52 Oct 31 11:33:31 n22kvmclone kernel: [d4a89238] __lock_acquire+0x1548/0x1ad0 Oct 31 11:33:31 n22kvmclone kernel: [f8903a16] ? set_avail_alloc_bits+0xd6/0xe0 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4e702c2] ? _raw_spin_unlock+0x22/0x30 Oct 31 11:33:31 n22kvmclone kernel: [f8903a16] ? set_avail_alloc_bits+0xd6/0xe0 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f89134f8] ? btrfs_make_block_group+0x1d8/0x290 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4a3fdd0] ? native_wbinvd+0x10/0x10 Oct 31 11:33:31 n22kvmclone kernel: [d4a89d7e] lock_acquire+0x9e/0x130 Oct 31 11:33:31 n22kvmclone kernel: [f8955661] ? btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8951854] __btrfs_alloc_chunk+0x684/0xb10 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8955661] ? btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8955661] btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8909fbd] do_chunk_alloc+0x1dd/0x410 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f89041ae] ? get_alloc_profile+0x17e/0x300 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f890b174] btrfs_check_data_free_space+0x144/0x320 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f893917b] __btrfs_buffered_write+0x10b/0x550 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4aa8187] ? current_kernel_time+0x87/0x120 Oct 31 11:33:31 n22kvmclone kernel: [d4a3fdf0] ? native_restore_fl+0x10/0x10 Oct 31 11:33:31 n22kvmclone kernel: [f89399f0] btrfs_file_write_iter+0x430/0x730 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4b5d22a] ? do_iter_readv_writev+0x6a/0xa0 Oct 31 11:33:31 n22kvmclone kernel: [d4b5d22a] do_iter_readv_writev+0x6a/0xa0 Oct 31 11:33:31 n22kvmclone kernel: [f89395c0] ? __btrfs_buffered_write+0x550/0x550 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4b5e520] do_readv_writev+0xa0/0x270 Oct 31 11:33:31 n22kvmclone kernel: [f89395c0] ? __btrfs_buffered_write+0x550/0x550 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4b5d2f0] ? do_sync_readv_writev+0x90/0x90 Oct 31 11:33:31 n22kvmclone kernel: [d4b79a70] ? __fdget_pos+0x30/0x40 Oct 31 11:33:31 n22kvmclone kernel: [d4aa8187] ? current_kernel_time+0x87/0x120 Oct 31 11:33:31 n22kvmclone kernel: [d4a3fdf0] ? native_restore_fl+0x10/0x10 Oct 31 11:33:31 n22kvmclone kernel: [d4ad2dd4] ? __audit_syscall_entry+0xa4/0x100 Oct 31 11:33:31 n22kvmclone kernel: [d4a86c0b] ? trace_hardirqs_on+0xb/0x10 Oct 31 11:33:31 n22kvmclone kernel: [d4b5e724] vfs_writev+0x34/0x60 Oct 31 11:33:31 n22kvmclone kernel: [d4b5e8e6] SyS_writev+0x56/0xe0 Oct 31 11:33:31 n22kvmclone kernel: [d4e70d6f] sysenter_do_call+0x12/0x12 -- Toralf pgp key: 0076 E94E -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
request for info on the list of parameters to tweak for PCIe SSDs
Hi, Could you kindly help us with the list of all the btrfs file system parameters , that can be tweaked for the best performance of the PCIe based SSDs ? Thanks, Lakshmi N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
request for info on the list of parameters to tweak for PCIe SSDs
Hi, Could you kindly help us with the list of all the btrfs file system parameters , that can be tweaked for the best performance of the PCIe based SSDs ? Thanks, Lakshmi N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
Re: [PATCH 2/2 v4] btrfs-progs: optimize btrfs_scan_lblkid() for multiple calls
On 31/10/2014 17:08, Karel Zak wrote: On Fri, Oct 31, 2014 at 12:11:20PM +0800, Anand Jain wrote: btrfs_scan_lblikd() is called by most the device related command functions. And btrfs_scan_lblkid() is most expensive function and it becomes more expensive as number of devices in the system increase. Further some threads call this wouldn't be possible to ask udev rather than scan all devices? I understand than in some cases it's necessary to have robust and independent solution, but for usual use-cases it would be less expensive to read the info from udev where we already keep track about all block devices and where we call libblkid. It would be possible to implement it as optional feature (#ifdev HAVE_LIBUDEV), the library API is very easy to use. (For example lsblk uses libblkid as fallback, the default is udev). Karel I might be missing something, correct me if wrong. Is there any udev API which gives me a list of devices which hold btrfs ? I just browsed through there isn't any. Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: request for info on the list of parameters to tweak for PCIe SSDs
Hi Lakshmi, Personally, i haven’t done many benchmarks about performance tuning, but there are some info you could refer to: If you feel intrested, you could take a look at following url which offers btrfs mount options: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/btrfs.txt And for Wiki, there are many information which might help you: https://btrfs.wiki.kernel.org/index.php/Main_Page There are also some options that you could tune for mkfs, see detailed info by ‘man mkfs.btrfs’.I supposed for example different ’nodesize/leafsize’ will have effects for performance. Also there are some new features Btrfs, like skinny_metadata, no_holes etc, you could clone a Latest btrfs-progs tools from: http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/ Some features are not enabled by default(skinny_medata,no_holes etc), but for benchmarking, you could turn on them and using latest kernel with latest btrfs-progs: root@localhost autotest]# mkfs.btrfs -O list-all Filesystem features available at mkfs time: mixed-bg- mixed data and metadata block groups (0x4) extref - increased hardlink limit per file to 65536 (0x40, default) raid56 - raid56 extended format (0x80) skinny-metadata - reduced-size metadata extent refs (0x100) no-holes- no explicit hole extents for files (0x200) It will be good that you could do some benchmarks for Btrfs with different parameters, and feel free to let others know your results(i am also curious).^_^ Best Regards, Wang Shilong Hi, Could you kindly help us with the list of all the btrfs file system parameters , that can be tweaked for the best performance of the PCIe based SSDs ? Thanks, Lakshmi -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 fails to recover chunk tree
Sadly I think I understand now. So by adding the second drive, BTRFS saw it as an extension of data (ala JBOD-ish?). Even though I thought I was only adding RAID1 for metadata, was also adding to the data storage. I assume that even though chunk-recover reports healthy chunks, there's little to no way to actually get them? On 10/31/2014 4:35 AM, Robert White wrote: On 10/30/2014 06:30 AM, Zack Coffey wrote: Rob, That second drive was immediately put to use elsewhere. I figured having only the metadata on that drive, it wouldn't matter. The data stayed single and wasn't part of the second drive, only the metadata was. I must not be capable of understanding why that wouldn't work. I thought all I was doing was removing a duplication of metadata and the worst I would see is a message complaining about a drive missing. Never thought the data or access to it could be compromised in what seemed to be a simple situation. Anand, I get the same output with mount -o recovery,ro. Your data is gone if your other drive is gone. Single doesn't mean what you think it means. Single means one single copy of your data, but it has _nothing_ to do with one single drive. That would mean that after a btrfs device add the default would be to never, ever, use that added drive. So RAID0 means striped, so there are chunks, then chunk=0 is on drive=0 at offset zero. Chunk=1 is on drive=1 at offset zero. (where there are N drives.) Chunk=N is on drive=N at offset zero. Chunk=N+1 is on drive=0 at offset Chunk_Size+1. And so on. Concatenation is that drive=N follows drive=N-1 at offset sum(sizeofeach(all drives less than N)). So Byte=0 is on drive=0 at offset0; and Byte=(sizeof drive0) is on drive=1 at byte=0. The RAID standard never addressed bulk concatenation, so there is no raid-number for the one whole drive after another. BTRFS uses single, others use other words. So if you had a 100G drive, and you added a second 100G drive, you'd have a logically 200G drive, where the first 100G is on drive one, and the second is on drive two. You've basically obliterated the second half of the filesystem storage when you physically removed the drive without semantically removing it first. Might as well have erased it with a magnet, and all the data with it. Worse still, if you did any sort of balance or defrag you likely moved huge numbers of the _single_ copy of your data clusters onto that other device. So the layout option isn't about limiting storage, that wouldn't make sense, that's what device add/delete is about. Its about how the data is laid out across all the drives. All those unreachable addresses are on that now-defunct drive. No mount option will ever get you that data back. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: move read only block groups onto their own list V2
Our gluster boxes were spending lots of time in statfs because our fs'es are huge. The problem is statfs loops through all of the block groups looking for read only block groups, and when you have several terabytes worth of data that ends up being a lot of block groups. Move the read only block groups onto a read only list and only proces that list in btrfs_account_ro_block_groups_free_space to reduce the amount of churn. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- V1-V2: -list_for_each_entry was using the wrong -member name. fs/btrfs/ctree.h | 4 fs/btrfs/extent-tree.c | 36 +--- 2 files changed, 17 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d557264e..438f087 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1170,6 +1170,7 @@ struct btrfs_space_info { struct percpu_counter total_bytes_pinned; struct list_head list; + struct list_head ro_bgs; struct rw_semaphore groups_sem; /* for block groups in our same type */ @@ -1305,6 +1306,9 @@ struct btrfs_block_group_cache { /* For delayed block group creation or deletion of empty block groups */ struct list_head bg_list; + + /* For read-only block groups */ + struct list_head ro_list; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 0d599ba..f51004f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3518,6 +3518,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, found-chunk_alloc = 0; found-flush = 0; init_waitqueue_head(found-wait); + INIT_LIST_HEAD(found-ro_bgs); ret = kobject_init_and_add(found-kobj, space_info_ktype, info-space_info_kobj, %s, @@ -8525,6 +8526,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) min_allocable_bytes = sinfo-total_bytes) { sinfo-bytes_readonly += num_bytes; cache-ro = 1; + list_add_tail(cache-ro_list, sinfo-ro_bgs); ret = 0; } out: @@ -8579,15 +8581,20 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, /* * helper to account the unused space of all the readonly block group in the - * list. takes mirrors into account. + * space_info. takes mirrors into account. */ -static u64 __btrfs_get_ro_block_group_free_space(struct list_head *groups_list) +u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) { struct btrfs_block_group_cache *block_group; u64 free_bytes = 0; int factor; - list_for_each_entry(block_group, groups_list, list) { + /* It's df, we don't care if it's racey */ + if (list_empty(sinfo-ro_bgs)) + return 0; + + spin_lock(sinfo-lock); + list_for_each_entry(block_group, sinfo-ro_bgs, ro_list) { spin_lock(block_group-lock); if (!block_group-ro) { @@ -8608,26 +8615,6 @@ static u64 __btrfs_get_ro_block_group_free_space(struct list_head *groups_list) spin_unlock(block_group-lock); } - - return free_bytes; -} - -/* - * helper to account the unused space of all the readonly block group in the - * space_info. takes mirrors into account. - */ -u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) -{ - int i; - u64 free_bytes = 0; - - spin_lock(sinfo-lock); - - for (i = 0; i BTRFS_NR_RAID_TYPES; i++) - if (!list_empty(sinfo-block_groups[i])) - free_bytes += __btrfs_get_ro_block_group_free_space( - sinfo-block_groups[i]); - spin_unlock(sinfo-lock); return free_bytes; @@ -8647,6 +8634,7 @@ void btrfs_set_block_group_rw(struct btrfs_root *root, cache-bytes_super - btrfs_block_group_used(cache-item); sinfo-bytes_readonly -= num_bytes; cache-ro = 0; + list_del_init(cache-ro_list); spin_unlock(cache-lock); spin_unlock(sinfo-lock); } @@ -9016,6 +9004,7 @@ btrfs_create_block_group_cache(struct btrfs_root *root, u64 start, u64 size) INIT_LIST_HEAD(cache-list); INIT_LIST_HEAD(cache-cluster_list); INIT_LIST_HEAD(cache-bg_list); + INIT_LIST_HEAD(cache-ro_list); btrfs_init_free_space_ctl(cache); return cache; @@ -9425,6 +9414,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, * are still on the list after taking the semaphore */ list_del_init(block_group-list); + list_del_init(block_group-ro_list); if (list_empty(block_group-space_info-block_groups[index])) { kobj = block_group-space_info-block_group_kobjs[index]; block_group-space_info-block_group_kobjs[index] = NULL;
Re: v3.18-rc2 at a 32 bit KVM gives :INFO: trying to register non-static key.the code is fine but needs lockdep annotation.
On Fri, Oct 31, 2014 at 11:36:05AM +0100, Toralf Förster wrote: On 10/31/2014 02:36 AM, Liu Bo wrote: On Fri, Oct 31, 2014 at 12:33:43AM +0100, Toralf Förster wrote: On 10/30/2014 11:15 AM, Liu Bo wrote: On Wed, Oct 29, 2014 at 05:56:33PM +0100, Toralf Förster wrote: This is new in my eyes, or ? : Also new to me, could you please turn on lock debug and try again? thanks, -liubo you mean CONFIG_PROVE_LOCKING=y right ? I'd have these open, CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_DEBUG_LOCKDEP=y __btrfs_alloc_chunk() uses write_[un]lock and spin_[un]lock, it may tell us something. thanks, -liubo I activated those config options and got immediately after starting the trinity fuzzer : Can you please tell me the trinity option? (I'm using trinity --dangerous -C 2 -N 200 -c writev -q -l off, But I only got softlockup for one time, others are OOM messages.) thanks, -liubo Oct 31 11:33:31 n22kvmclone kernel: INFO: trying to register non-static key. Oct 31 11:33:31 n22kvmclone kernel: the code is fine but needs lockdep annotation. Oct 31 11:33:31 n22kvmclone kernel: turning off the locking correctness validator. Oct 31 11:33:31 n22kvmclone kernel: CPU: 1 PID: 2470 Comm: trinity-c1 Not tainted 3.18.0-rc2 #3 Oct 31 11:33:31 n22kvmclone kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Oct 31 11:33:31 n22kvmclone kernel: f4209b38 d4e689de d5893078 f4209bc0 d4a89238 d4fb1b3c Oct 31 11:33:31 n22kvmclone kernel: f4fc8e14 f4209b78 0282 f4fc8df4 f4209b84 0282 Oct 31 11:33:31 n22kvmclone kernel: 0282 f8903a16 f4fc8e04 f4fc8e04 f4209b84 d4e702c2 d5893078 Oct 31 11:33:31 n22kvmclone kernel: Call Trace: Oct 31 11:33:31 n22kvmclone kernel: [d4e689de] dump_stack+0x41/0x52 Oct 31 11:33:31 n22kvmclone kernel: [d4a89238] __lock_acquire+0x1548/0x1ad0 Oct 31 11:33:31 n22kvmclone kernel: [f8903a16] ? set_avail_alloc_bits+0xd6/0xe0 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4e702c2] ? _raw_spin_unlock+0x22/0x30 Oct 31 11:33:31 n22kvmclone kernel: [f8903a16] ? set_avail_alloc_bits+0xd6/0xe0 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f89134f8] ? btrfs_make_block_group+0x1d8/0x290 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4a3fdd0] ? native_wbinvd+0x10/0x10 Oct 31 11:33:31 n22kvmclone kernel: [d4a89d7e] lock_acquire+0x9e/0x130 Oct 31 11:33:31 n22kvmclone kernel: [f8955661] ? btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8951854] __btrfs_alloc_chunk+0x684/0xb10 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8955661] ? btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8955661] btrfs_alloc_chunk+0x41/0x50 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f8909fbd] do_chunk_alloc+0x1dd/0x410 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f89041ae] ? get_alloc_profile+0x17e/0x300 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f890b174] btrfs_check_data_free_space+0x144/0x320 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [f893917b] __btrfs_buffered_write+0x10b/0x550 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4aa8187] ? current_kernel_time+0x87/0x120 Oct 31 11:33:31 n22kvmclone kernel: [d4a3fdf0] ? native_restore_fl+0x10/0x10 Oct 31 11:33:31 n22kvmclone kernel: [f89399f0] btrfs_file_write_iter+0x430/0x730 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4b5d22a] ? do_iter_readv_writev+0x6a/0xa0 Oct 31 11:33:31 n22kvmclone kernel: [d4b5d22a] do_iter_readv_writev+0x6a/0xa0 Oct 31 11:33:31 n22kvmclone kernel: [f89395c0] ? __btrfs_buffered_write+0x550/0x550 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4b5e520] do_readv_writev+0xa0/0x270 Oct 31 11:33:31 n22kvmclone kernel: [f89395c0] ? __btrfs_buffered_write+0x550/0x550 [btrfs] Oct 31 11:33:31 n22kvmclone kernel: [d4b5d2f0] ? do_sync_readv_writev+0x90/0x90 Oct 31 11:33:31 n22kvmclone kernel: [d4b79a70] ? __fdget_pos+0x30/0x40 Oct 31 11:33:31 n22kvmclone kernel: [d4aa8187] ? current_kernel_time+0x87/0x120 Oct 31 11:33:31 n22kvmclone kernel: [d4a3fdf0] ? native_restore_fl+0x10/0x10 Oct 31 11:33:31 n22kvmclone kernel: [d4ad2dd4] ? __audit_syscall_entry+0xa4/0x100 Oct 31 11:33:31 n22kvmclone kernel: [d4a86c0b] ? trace_hardirqs_on+0xb/0x10 Oct 31 11:33:31 n22kvmclone kernel: [d4b5e724] vfs_writev+0x34/0x60 Oct 31 11:33:31 n22kvmclone kernel: [d4b5e8e6] SyS_writev+0x56/0xe0 Oct 31 11:33:31 n22kvmclone kernel: [d4e70d6f] sysenter_do_call+0x12/0x12 -- Toralf pgp key: 0076 E94E -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: move read only block groups onto their own list V2
On Fri, Oct 31, 2014 at 09:49:34AM -0400, Josef Bacik wrote: Our gluster boxes were spending lots of time in statfs because our fs'es are huge. The problem is statfs loops through all of the block groups looking for read only block groups, and when you have several terabytes worth of data that ends up being a lot of block groups. Move the read only block groups onto a read only list and only proces that list in btrfs_account_ro_block_groups_free_space to reduce the amount of churn. Thanks, Looks good. Reviewed-by: Liu Bo bo.li@oracle.com -liubo Signed-off-by: Josef Bacik jba...@fb.com --- V1-V2: -list_for_each_entry was using the wrong -member name. fs/btrfs/ctree.h | 4 fs/btrfs/extent-tree.c | 36 +--- 2 files changed, 17 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d557264e..438f087 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1170,6 +1170,7 @@ struct btrfs_space_info { struct percpu_counter total_bytes_pinned; struct list_head list; + struct list_head ro_bgs; struct rw_semaphore groups_sem; /* for block groups in our same type */ @@ -1305,6 +1306,9 @@ struct btrfs_block_group_cache { /* For delayed block group creation or deletion of empty block groups */ struct list_head bg_list; + + /* For read-only block groups */ + struct list_head ro_list; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 0d599ba..f51004f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3518,6 +3518,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, found-chunk_alloc = 0; found-flush = 0; init_waitqueue_head(found-wait); + INIT_LIST_HEAD(found-ro_bgs); ret = kobject_init_and_add(found-kobj, space_info_ktype, info-space_info_kobj, %s, @@ -8525,6 +8526,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) min_allocable_bytes = sinfo-total_bytes) { sinfo-bytes_readonly += num_bytes; cache-ro = 1; + list_add_tail(cache-ro_list, sinfo-ro_bgs); ret = 0; } out: @@ -8579,15 +8581,20 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, /* * helper to account the unused space of all the readonly block group in the - * list. takes mirrors into account. + * space_info. takes mirrors into account. */ -static u64 __btrfs_get_ro_block_group_free_space(struct list_head *groups_list) +u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) { struct btrfs_block_group_cache *block_group; u64 free_bytes = 0; int factor; - list_for_each_entry(block_group, groups_list, list) { + /* It's df, we don't care if it's racey */ + if (list_empty(sinfo-ro_bgs)) + return 0; + + spin_lock(sinfo-lock); + list_for_each_entry(block_group, sinfo-ro_bgs, ro_list) { spin_lock(block_group-lock); if (!block_group-ro) { @@ -8608,26 +8615,6 @@ static u64 __btrfs_get_ro_block_group_free_space(struct list_head *groups_list) spin_unlock(block_group-lock); } - - return free_bytes; -} - -/* - * helper to account the unused space of all the readonly block group in the - * space_info. takes mirrors into account. - */ -u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) -{ - int i; - u64 free_bytes = 0; - - spin_lock(sinfo-lock); - - for (i = 0; i BTRFS_NR_RAID_TYPES; i++) - if (!list_empty(sinfo-block_groups[i])) - free_bytes += __btrfs_get_ro_block_group_free_space( - sinfo-block_groups[i]); - spin_unlock(sinfo-lock); return free_bytes; @@ -8647,6 +8634,7 @@ void btrfs_set_block_group_rw(struct btrfs_root *root, cache-bytes_super - btrfs_block_group_used(cache-item); sinfo-bytes_readonly -= num_bytes; cache-ro = 0; + list_del_init(cache-ro_list); spin_unlock(cache-lock); spin_unlock(sinfo-lock); } @@ -9016,6 +9004,7 @@ btrfs_create_block_group_cache(struct btrfs_root *root, u64 start, u64 size) INIT_LIST_HEAD(cache-list); INIT_LIST_HEAD(cache-cluster_list); INIT_LIST_HEAD(cache-bg_list); + INIT_LIST_HEAD(cache-ro_list); btrfs_init_free_space_ctl(cache); return cache; @@ -9425,6 +9414,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, * are still on the list after taking the semaphore */ list_del_init(block_group-list); + list_del_init(block_group-ro_list); if (list_empty(block_group-space_info-block_groups[index])) {
Re: v3.18-rc2 at a 32 bit KVM gives :INFO: trying to register non-static key.the code is fine but needs lockdep annotation.
On 10/31/2014 03:12 PM, Liu Bo wrote: Can you please tell me the trinity option? (I'm using trinity --dangerous -C 2 -N 200 -c writev -q -l off, But I only got softlockup for one time, others are OOM messages.) thanks, -liubo I'm running within the 32 bit KVM guest : $ mkdir /mnt/ramdisk/btrfs; truncate -s 97M /mnt/ramdisk/btrfs.fs; /sbin/mkfs.btrfs /mnt/ramdisk/btrfs.fs; sudo su -c mount -o loop,compress=lzo /mnt/ramdisk/btrfs.fs /mnt/ramdisk/btrfs; chmod 777 /mnt/ramdisk/btrfs followed by : $ D=': D=/mnt/ramdisk/btrfs; while [[ : ]]; do cd ~; sudo rm -rf $D/t3 mkdir $D/t3 || break; cd $D/t3; mkdir -p v1/v2; for i in $(seq 0 99); do touch v1/v2/f$i; mkdir v1/v2/d$i; done; trinity -C 2 -N 10 -V $D/t3/v1/v2 -q; echo; echo done; echo; sleep 4; done (latest git tree of trinity and Btrfs v3.14.2 -- Toralf pgp key: 0076 E94E -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota question
Besides the first question, I met an issue using parent groups (see http://pastebin.com/asT5ZFsi). I can't reproduce it all the time, but it seems to appear frequently. Is there any know BUG that can be the source of this error ? I'm using version 3.12 on Trusty Thanks -- Cyril SCETBON On 30 Oct 2014, at 19:37, Cyril Scetbon cyril.scet...@free.fr wrote: Hi, At https://btrfs.wiki.kernel.org/index.php/Quota_support it's said that Using btrfs subvolume delete will break qgroup unshared space usage. Does it mean that if the qgroup associated with the subvolume is attached to another qgroup, this other qgroup won't be able to correctly apply its quota ? If yes, is there a workaround for that ? Thanks -- Cyril SCETBON -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: don't take the chunk_mutex/dev_list mutex in statfs
Our gluster boxes get several thousand statfs() calls per second, which begins to suck hardcore with all of the lock contention on the chunk mutex and dev list mutex. We don't really need to hold these things, if we have transient weirdness with statfs() because of the chunk allocator we don't care, so remove this locking. We still need the dev_list lock if you mount with -o alloc_start however, which is a good argument for nuking that thing from orbit, but that's a patch for another day. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/super.c | 69 1 file changed, 45 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 54bd91e..2c9ba11 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1644,8 +1644,20 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) int i = 0, nr_devices; int ret; + /* +* We aren't under the device list lock, so this is racey-ish, but good +* enough for our purposes. +*/ nr_devices = fs_info-fs_devices-open_devices; - BUG_ON(!nr_devices); + if (!nr_devices) { + smp_mb(); + nr_devices = fs_info-fs_devices-open_devices; + ASSERT(nr_devices); + if (!nr_devices) { + *free_bytes = 0; + return 0; + } + } devices_info = kmalloc_array(nr_devices, sizeof(*devices_info), GFP_NOFS); @@ -1670,11 +1682,17 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) else min_stripe_size = BTRFS_STRIPE_LEN; - list_for_each_entry(device, fs_devices-devices, dev_list) { + if (fs_info-alloc_start) + mutex_lock(fs_devices-device_list_mutex); + rcu_read_lock(); + list_for_each_entry_rcu(device, fs_devices-devices, dev_list) { if (!device-in_fs_metadata || !device-bdev || device-is_tgtdev_for_dev_replace) continue; + if (i = nr_devices) + break; + avail_space = device-total_bytes - device-bytes_used; /* align with stripe_len */ @@ -1690,23 +1708,30 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) /* user can set the offset in fs_info-alloc_start. */ if (fs_info-alloc_start + BTRFS_STRIPE_LEN = - device-total_bytes) + device-total_bytes) { + rcu_read_unlock(); skip_space = max(fs_info-alloc_start, skip_space); - /* -* btrfs can not use the free space in [0, skip_space - 1], -* we must subtract it from the total. In order to implement -* it, we account the used space in this range first. -*/ - ret = btrfs_account_dev_extents_size(device, 0, skip_space - 1, -used_space); - if (ret) { - kfree(devices_info); - return ret; - } + /* +* btrfs can not use the free space in +* [0, skip_space - 1], we must subtract it from the +* total. In order to implement it, we account the used +* space in this range first. +*/ + ret = btrfs_account_dev_extents_size(device, 0, +skip_space - 1, +used_space); + if (ret) { + kfree(devices_info); + mutex_unlock(fs_devices-device_list_mutex); + return ret; + } - /* calc the free space in [0, skip_space - 1] */ - skip_space -= used_space; + rcu_read_lock(); + + /* calc the free space in [0, skip_space - 1] */ + skip_space -= used_space; + } /* * we can use the free space in [0, skip_space - 1], subtract @@ -1725,6 +1750,9 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) i++; } + rcu_read_unlock(); + if (fs_info-alloc_start) + mutex_unlock(fs_devices-device_list_mutex); nr_devices = i; @@ -1787,8 +1815,6 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) * holding chunk_muext to avoid allocating new chunks, holding * device_list_mutex to avoid the
which subvolume is mounted?
let’s first assume the contents of /etc/fstab are either not used or invalid in mounting the subvolumes. given the following ‘df’ command, how do i know which subvolume of the btrfs filesystem on /dev/sda3 is mounted at each mount point (/, /var, /opt, /home)? i would have expected to see the mount option used to define the subvolume (subvolid or subvol option) in /proc/mounts. # df Filesystem 1K-blocksUsed Available Use% Mounted on /dev/sda36839296 1698564 4903212 26% / devtmpfs 501464 0501464 0% /dev tmpfs 507316 0507316 0% /dev/shm tmpfs 5073166720500596 2% /run tmpfs 507316 0507316 0% /sys/fs/cgroup /dev/sda36839296 1698564 4903212 26% /var /dev/sda36839296 1698564 4903212 26% /opt /dev/sda36839296 1698564 4903212 26% /home /dev/sda1 517868 93040424828 18% /boot # btrfs subvolume list -a --sort=+rootid / ID 257 gen 7800 top level 5 path FS_TREE/root ID 258 gen 4127 top level 5 path FS_TREE/home ID 259 gen 7801 top level 5 path FS_TREE/var ID 260 gen 7795 top level 5 path FS_TREE/opt # uname -a Linux turner11.storix 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.12 # btrfs fi show Label: rhel_turner11 uuid: cd3c0e50-d726-44e2-9bfa-19b11614136a Total devices 1 FS bytes used 1.62GiB devid1 size 6.52GiB used 2.24GiB path /dev/sda3 Btrfs v3.12 # btrfs fi df / Data, single: total=1.98GiB, used=1.58GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=264.00MiB, used=36.03MiB dmesg.log Description: Binary data
Re: filesystem corruption
I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. btrfs[0x42bd62] btrfs[0x42ffe5] btrfs[0x430211] btrfs[0x4246ec] btrfs[0x424d11] btrfs[0x426af3] btrfs[0x41b18c] btrfs[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ffca1119ec5] btrfs[0x40b497] This can be repeated as often as I want ;) Nothing changed. Regards Tobias 2014-10-31 3:41 GMT+01:00 Rich Freeman r-bt...@thefreemanclan.net: On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote: Addition: I found some posts here about a general file system corruption in 3.17 and 3.17.1 - is this the cause? Additionally I am using ro-snapshots - maybe this is the cause, too? Anyway: Can I fix that or do I have to reinstall? Haven't touched the filesystem, just did a scrub (found 0 errors). Yup - ro-snapshots is a big problem in 3.17. You can probably recover now by: 1. Update your kernel to 3.17.2 - that takes care of all the big known 3.16/17 issues in general. 2. Run btrfs check using btrfs-tools 3.17. That can clean up the broken snapshots in your filesystem. That is fairly likely to get your filesystem working normally again. It worked for me. I was getting some balance issues when trying to add another device and I'm not sure if 3.17.2 totally fixed that - I ended up cancelling the balance and it will be a while before I have to balance this particular filesystem again, so I'll just hold off and hope things stabilize. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/11] Btrfs-progs: allow fsck to take the tree bytenr
Sometimes we have a pretty corrupted fs but have an old tree bytenr that we could use, add the ability to specify the tree root bytenr. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- cmds-check.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 2a5f823..38f8d11 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -7546,6 +7546,7 @@ static struct option long_options[] = { { backup, 0, NULL, 0 }, { subvol-extents, no_argument, NULL, 'E' }, { qgroup-report, 0, NULL, 'Q' }, + { tree-root, 1, NULL, 'r' }, { NULL, 0, NULL, 0} }; @@ -7561,6 +7562,7 @@ const char * const cmd_check_usage[] = { --check-data-csum verify checkums of data blocks, --qgroup-report print a report on qgroup consistency, --subvol-extentsprint subvolume extents and sharing state, + --tree-root use the given bytenr for the tree root, NULL }; @@ -7571,6 +7573,7 @@ int cmd_check(int argc, char **argv) struct btrfs_fs_info *info; u64 bytenr = 0; u64 subvolid = 0; + u64 tree_root_bytenr = 0; char uuidbuf[BTRFS_UUID_UNPARSED_SIZE]; int ret; u64 num; @@ -7581,7 +7584,7 @@ int cmd_check(int argc, char **argv) while(1) { int c; - c = getopt_long(argc, argv, as:b, long_options, + c = getopt_long(argc, argv, as:br:, long_options, option_index); if (c 0) break; @@ -7608,6 +7611,9 @@ int cmd_check(int argc, char **argv) case 'E': subvolid = arg_strtou64(optarg); break; + case 'r': + tree_root_bytenr = arg_strtou64(optarg); + break; case '?': case 'h': usage(cmd_check_usage); @@ -7651,7 +7657,8 @@ int cmd_check(int argc, char **argv) if (repair) ctree_flags |= OPEN_CTREE_PARTIAL; - info = open_ctree_fs_info(argv[optind], bytenr, 0, ctree_flags); + info = open_ctree_fs_info(argv[optind], bytenr, tree_root_bytenr, + ctree_flags); if (!info) { fprintf(stderr, Couldn't open file system\n); ret = -EIO; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/11] Btrfs-progs: don't fail on log tree opening with PARTIAL
We were failing to fsck a volume because we couldn't open the log tree, which is not helpful. Make us skip erroring out if we are using OPEN_CTREE_PARTIAL since it isn't a mandatory tree. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/disk-io.c b/disk-io.c index 77fc610..d2c18a8 100644 --- a/disk-io.c +++ b/disk-io.c @@ -930,7 +930,8 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr, ret = find_and_setup_log_root(root, fs_info, sb); if (ret) { printk(Couldn't setup log root tree\n); - return -EIO; + if (!(flags OPEN_CTREE_PARTIAL)) + return -EIO; } fs_info-generation = generation; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/11] Btrfs-progs: add the ability to delete items
Somtetimes you just need to delete an item, add that functionality to btrfs-corrupt-block. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- btrfs-corrupt-block.c | 45 - 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 171c81d..af9ae4d 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -111,6 +111,7 @@ static void print_usage(void) fprintf(stderr, \t-I An item to corrupt (must also specify the field to corrupt and a root+key for the item)\n); fprintf(stderr, \t-D Corrupt a dir item, must specify key and field\n); + fprintf(stderr, \t-d Delete this item (must specify -K)\n); exit(1); } @@ -811,6 +812,39 @@ out: return ret; } +static int delete_item(struct btrfs_root *root, struct btrfs_key *key) +{ + struct btrfs_trans_handle *trans; + struct btrfs_path *path; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + btrfs_free_path(path); + fprintf(stderr, Couldn't start transaction %ld\n, + PTR_ERR(trans)); + return PTR_ERR(trans); + } + + ret = btrfs_search_slot(trans, root, key, path, -1, 1); + if (ret) { + if (ret 0) + ret = -ENOENT; + fprintf(stderr, Error searching to node %d\n, ret); + goto out; + } + ret = btrfs_del_item(trans, root, path); + btrfs_mark_buffer_dirty(path-nodes[0]); +out: + btrfs_commit_transaction(trans, root); + btrfs_free_path(path); + return ret; +} + static struct option long_options[] = { /* { byte-count, 1, NULL, 'b' }, */ { logical, 1, NULL, 'l' }, @@ -828,6 +862,7 @@ static struct option long_options[] = { { key, 1, NULL, 'K'}, { item, 0, NULL, 'I'}, { dir-item, 0, NULL, 'D'}, + { delete, 0, NULL, 'd'}, { 0, 0, 0, 0} }; @@ -993,6 +1028,7 @@ int main(int ac, char **av) int chunk_tree = 0; int corrupt_item = 0; int corrupt_di = 0; + int delete = 0; u64 metadata_block = 0; u64 inode = 0; u64 file_extent = (u64)-1; @@ -1004,7 +1040,7 @@ int main(int ac, char **av) while(1) { int c; - c = getopt_long(ac, av, l:c:b:eEkuUi:f:x:m:K:ID, long_options, + c = getopt_long(ac, av, l:c:b:eEkuUi:f:x:m:K:IDd, long_options, option_index); if (c 0) break; @@ -1060,6 +1096,8 @@ int main(int ac, char **av) break; case 'I': corrupt_item = 1; + case 'd': + delete = 1; break; default: print_usage(); @@ -1167,6 +1205,11 @@ int main(int ac, char **av) if (!key.objectid) print_usage(); ret = corrupt_btrfs_item(root, key, field); + } + if (delete) { + if (!key.objectid) + print_usage(); + ret = delete_item(root, key); goto out_close; } if (key.objectid || key.offset || key.type) { -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/11] Btrfs-progs: spit out the broken file when ignoring errors
It's nice to ignore errors on restore, but spit out the filename so the user knows which files of his aren't going to look right. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- cmds-restore.c | 4 1 file changed, 4 insertions(+) diff --git a/cmds-restore.c b/cmds-restore.c index 38a131e..b52d5c8 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -840,6 +840,8 @@ static int search_dir(struct btrfs_root *root, struct btrfs_key *key, ret = copy_file(root, fd, location, path_name); close(fd); if (ret) { + fprintf(stderr, Error copying data for %s\n, + path_name); if (ignore_errors) goto next; btrfs_free_path(path); @@ -917,6 +919,8 @@ static int search_dir(struct btrfs_root *root, struct btrfs_key *key, output_rootdir, dir, mreg); free(dir); if (ret) { + fprintf(stderr, Error searching %s\n, + path_name); if (ignore_errors) goto next; btrfs_free_path(path); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/11] Btrfs-progs: make zero-log use partial open
Because seriously, we only want to kill the tree log. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- btrfs-zero-log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/btrfs-zero-log.c b/btrfs-zero-log.c index 88998e9..46c705c 100644 --- a/btrfs-zero-log.c +++ b/btrfs-zero-log.c @@ -61,7 +61,7 @@ int main(int ac, char **av) goto out; } - root = open_ctree(av[1], 0, OPEN_CTREE_WRITES); + root = open_ctree(av[1], 0, OPEN_CTREE_WRITES | OPEN_CTREE_PARTIAL); if (root == NULL) return 1; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/11] Btrfs-progs: add a message to know zero log ran successfully
If there are errors when opening the fs because of PARTIAL we could think that the zero-log didn't actually work. Add a printf so we know that it was successfull. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- btrfs-zero-log.c | 1 + 1 file changed, 1 insertion(+) diff --git a/btrfs-zero-log.c b/btrfs-zero-log.c index 46c705c..4154175 100644 --- a/btrfs-zero-log.c +++ b/btrfs-zero-log.c @@ -71,6 +71,7 @@ int main(int ac, char **av) btrfs_set_super_log_root_level(root-fs_info-super_copy, 0); btrfs_commit_transaction(trans, root); close_ctree(root); + printf(Log root zero'ed\n); out: return !!ret; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Various btrfsck updates
Please pull https://github.com/josefbacik/btrfs-progs.git for-kdave This is all the work from fixing a random IRC users broken file system. This also includes two patches that were not picked up previously for some reason, and includes 4 new test images. The test images may not make it to the list, but they are in the git tree (they are patch 02/11 and 11/11). Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/11] Btrfs-progs: fix missing inode items
If we have all the other items but no inode item we can recreate it for the most part, with the exception of the permissions and ownership. Add this ability to btrfsck. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- cmds-check.c | 76 1 file changed, 76 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index be75dcb..a30db8d 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -1727,6 +1727,61 @@ static int delete_dir_index(struct btrfs_root *root, return ret; } +static int create_inode_item(struct btrfs_root *root, +struct inode_record *rec, +struct inode_backref *backref, int root_dir) +{ + struct btrfs_trans_handle *trans; + struct btrfs_inode_item inode_item; + time_t now = time(NULL); + int ret; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + return ret; + } + + fprintf(stderr, root %llu inode %llu recreating inode item, this may + be incomplete, please check permissions and content after + the fsck completes.\n, (unsigned long long)root-objectid, + (unsigned long long)rec-ino); + + memset(inode_item, 0, sizeof(inode_item)); + btrfs_set_stack_inode_generation(inode_item, trans-transid); + if (root_dir) + btrfs_set_stack_inode_nlink(inode_item, 1); + else + btrfs_set_stack_inode_nlink(inode_item, rec-found_link); + btrfs_set_stack_inode_nbytes(inode_item, rec-found_size); + if (rec-found_dir_item) { + if (rec-found_file_extent) + fprintf(stderr, root %llu inode %llu has both a dir + item and extents, unsure if it is a dir or a + regular file so setting it as a directory\n, + (unsigned long long)root-objectid, + (unsigned long long)rec-ino); + btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755); + btrfs_set_stack_inode_size(inode_item, rec-found_size); + } else if (!rec-found_dir_item) { + btrfs_set_stack_inode_size(inode_item, rec-extent_end); + btrfs_set_stack_inode_mode(inode_item, S_IFREG | 0755); + } + btrfs_set_stack_timespec_sec(inode_item.atime, now); + btrfs_set_stack_timespec_nsec(inode_item.atime, 0); + btrfs_set_stack_timespec_sec(inode_item.ctime, now); + btrfs_set_stack_timespec_nsec(inode_item.ctime, 0); + btrfs_set_stack_timespec_sec(inode_item.mtime, now); + btrfs_set_stack_timespec_nsec(inode_item.mtime, 0); + btrfs_set_stack_timespec_sec(inode_item.otime, 0); + btrfs_set_stack_timespec_nsec(inode_item.otime, 0); + + ret = btrfs_insert_inode(trans, root, rec-ino, inode_item); + BUG_ON(ret); + btrfs_commit_transaction(trans, root); + return 0; +} + static int repair_inode_backrefs(struct btrfs_root *root, struct inode_record *rec, struct cache_tree *inode_cache, @@ -1738,6 +1793,15 @@ static int repair_inode_backrefs(struct btrfs_root *root, int repaired = 0; list_for_each_entry_safe(backref, tmp, rec-backrefs, list) { + if (!delete rec-ino == root_dirid) { + if (!rec-found_inode_item) { + ret = create_inode_item(root, rec, backref, 1); + if (ret) + break; + repaired++; + } + } + /* Index 0 for root dir's are special, don't mess with it */ if (rec-ino == root_dirid backref-index == 0) continue; @@ -1799,6 +1863,18 @@ static int repair_inode_backrefs(struct btrfs_root *root, btrfs_commit_transaction(trans, root); repaired++; } + + if (!delete (backref-found_inode_ref + backref-found_dir_index + backref-found_dir_item + !(backref-errors REF_ERR_INDEX_UNMATCH) + !rec-found_inode_item)) { + ret = create_inode_item(root, rec, backref, 0); + if (ret) + break; + repaired++; + } + } return ret ? ret : repaired; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/11] Btrfs-progs: add ability to replace missing dir item/dir indexes
If we have everything except the dir item and dir index we can easily replace them, so add this ability to btrfsck. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- cmds-check.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 38f8d11..be75dcb 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -1772,6 +1772,33 @@ static int repair_inode_backrefs(struct btrfs_root *root, } } + if (!delete (!backref-found_dir_index + !backref-found_dir_item + backref-found_inode_ref)) { + struct btrfs_trans_handle *trans; + struct btrfs_key location; + + location.objectid = rec-ino; + location.type = BTRFS_INODE_ITEM_KEY; + location.offset = 0; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + break; + } + fprintf(stderr, adding missing dir index/item pair + for inode %llu\n, + (unsigned long long)rec-ino); + ret = btrfs_insert_dir_item(trans, root, backref-name, + backref-namelen, + backref-dir, location, + imode_to_type(rec-imode), + backref-index); + BUG_ON(ret); + btrfs_commit_transaction(trans, root); + repaired++; + } } return ret ? ret : repaired; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix typos in btrfs_check_super_valid
Copypaste errors in some messages and add few more missing macro accessors. Signed-off-by: David Sterba dste...@suse.cz --- This is for 3.18-rc, a fixup to btrfs: use macro accessors in superblock validation checks fs/btrfs/disk-io.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1bf9f897065d..7af9a1978a2f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3839,12 +3839,12 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, */ if (!IS_ALIGNED(btrfs_super_root(sb), 4096)) printk(KERN_WARNING BTRFS: tree_root block unaligned: %llu\n, - sb-root); + btrfs_super_root(sb)); if (!IS_ALIGNED(btrfs_super_chunk_root(sb), 4096)) - printk(KERN_WARNING BTRFS: tree_root block unaligned: %llu\n, - sb-chunk_root); + printk(KERN_WARNING BTRFS: chunk_root block unaligned: %llu\n, + btrfs_super_chunk_root(sb)); if (!IS_ALIGNED(btrfs_super_log_root(sb), 4096)) - printk(KERN_WARNING BTRFS: tree_root block unaligned: %llu\n, + printk(KERN_WARNING BTRFS: log_root block unaligned: %llu\n, btrfs_super_log_root(sb)); if (memcmp(fs_info-fsid, sb-dev_item.fsid, BTRFS_UUID_SIZE) != 0) { -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Compile BtrFS as kernel module
Hi, I want to compile btrfs as kernel module with special debug prints. Can anyone help me with the instructions to do that? I tried to do online search but I couldn't find anything. Regards, Nishant -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Compile BtrFS as kernel module
2014-10-31 22:23 GMT+03:00 Nishant Agrawal nragra...@cs.wisc.edu: Hi, I want to compile btrfs as kernel module with special debug prints. Can anyone help me with the instructions to do that? I tried to do online search but I couldn't find anything. Regards, Nishant -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hm.. http://www.cyberciti.biz/tips/compiling-linux-kernel-module.html You must can compile and configure vanilla kernel first and only after you can do experiments with kernel development. As i know btrfs have special fields in kernel config for debugging. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] btrfs: Deletion of unnecessary checks before six function calls
The following functions test whether their argument is NULL and then return immediately. * btrfs_free_path() * free_extent_buffer() * free_extent_map() * free_extent_state() * iput() * kfree() Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring elfr...@users.sourceforge.net --- fs/btrfs/dev-replace.c | 3 +-- fs/btrfs/extent_io.c | 12 fs/btrfs/file.c | 6 ++ fs/btrfs/free-space-cache.c | 7 +++ fs/btrfs/inode.c | 6 ++ fs/btrfs/reada.c | 3 +-- fs/btrfs/relocation.c| 3 +-- fs/btrfs/tests/btrfs-tests.c | 3 +-- fs/btrfs/tree-defrag.c | 3 +-- fs/btrfs/tree-log.c | 6 ++ 10 files changed, 18 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 6f662b3..3465029 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -183,8 +183,7 @@ no_valid_dev_replace_entry_found: } out: - if (path) - btrfs_free_path(path); + btrfs_free_path(path); return ret; } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bf3f424..cfbf00a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -704,8 +704,7 @@ next: out: spin_unlock(tree-lock); - if (prealloc) - free_extent_state(prealloc); + free_extent_state(prealloc); return 0; @@ -1006,8 +1005,7 @@ hit_next: out: spin_unlock(tree-lock); - if (prealloc) - free_extent_state(prealloc); + free_extent_state(prealloc); return err; @@ -1223,8 +1221,7 @@ hit_next: out: spin_unlock(tree-lock); - if (prealloc) - free_extent_state(prealloc); + free_extent_state(prealloc); return err; @@ -4146,8 +4143,7 @@ int extent_readpages(struct extent_io_tree *tree, __extent_readpages(tree, pagepool, nr, get_extent, em_cached, bio, 0, bio_flags, READ); - if (em_cached) - free_extent_map(em_cached); + free_extent_map(em_cached); BUG_ON(!list_empty(pages)); if (bio) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index a18ceab..add07ce8 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -677,10 +677,8 @@ next: /* once for the tree*/ free_extent_map(em); } - if (split) - free_extent_map(split); - if (split2) - free_extent_map(split2); + free_extent_map(split); + free_extent_map(split2); } /* diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 3384819..11883e2 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1943,8 +1943,7 @@ new_bitmap: out: if (info) { - if (info-bitmap) - kfree(info-bitmap); + kfree(info-bitmap); kmem_cache_free(btrfs_free_space_cachep, info); } @@ -3322,8 +3321,8 @@ again: if (info) kmem_cache_free(btrfs_free_space_cachep, info); - if (map) - kfree(map); + + kfree(map); return 0; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d23362f..7301b99 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -857,8 +857,7 @@ static u64 get_extent_allocation_hint(struct inode *inode, u64 start, em = search_extent_mapping(em_tree, 0, 0); if (em em-block_start EXTENT_MAP_LAST_BYTE) alloc_hint = em-block_start; - if (em) - free_extent_map(em); + free_extent_map(em); } else { alloc_hint = em-block_start; free_extent_map(em); @@ -6573,8 +6572,7 @@ out: trace_btrfs_get_extent(root, em); - if (path) - btrfs_free_path(path); + btrfs_free_path(path); if (trans) { ret = btrfs_end_transaction(trans, root); if (!err) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index b63ae20..ec8eb49 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -731,8 +731,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, else if (eb) __readahead_hook(fs_info-extent_root, eb, eb-start, ret); - if (eb) - free_extent_buffer(eb); + free_extent_buffer(eb); return 1; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 74257d6..f87a5ee 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4158,8 +4158,7 @@ out: btrfs_end_transaction(trans, root); btrfs_btree_balance_dirty(root); if (err) { - if (inode) - iput(inode); +
Re: [Cocci] [PATCH 1/1] btrfs: Deletion of unnecessary checks before six function calls
On Fri, 31 Oct 2014, SF Markus Elfring wrote: The following functions test whether their argument is NULL and then return immediately. * btrfs_free_path() * free_extent_buffer() * free_extent_map() * free_extent_state() * iput() * kfree() Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring elfr...@users.sourceforge.net --- fs/btrfs/dev-replace.c | 3 +-- fs/btrfs/extent_io.c | 12 fs/btrfs/file.c | 6 ++ fs/btrfs/free-space-cache.c | 7 +++ fs/btrfs/inode.c | 6 ++ fs/btrfs/reada.c | 3 +-- fs/btrfs/relocation.c| 3 +-- fs/btrfs/tests/btrfs-tests.c | 3 +-- fs/btrfs/tree-defrag.c | 3 +-- fs/btrfs/tree-log.c | 6 ++ 10 files changed, 18 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 6f662b3..3465029 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -183,8 +183,7 @@ no_valid_dev_replace_entry_found: } out: - if (path) - btrfs_free_path(path); + btrfs_free_path(path); It appears to be statically apparent whether btrfs_free_path is needed or not. The code could be changed both not to have the test and not to have the jump and call to btrfs_free_path. This is probably the case for the other occurrences next to labels. julia return ret; } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bf3f424..cfbf00a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -704,8 +704,7 @@ next: out: spin_unlock(tree-lock); - if (prealloc) - free_extent_state(prealloc); + free_extent_state(prealloc); return 0; @@ -1006,8 +1005,7 @@ hit_next: out: spin_unlock(tree-lock); - if (prealloc) - free_extent_state(prealloc); + free_extent_state(prealloc); return err; @@ -1223,8 +1221,7 @@ hit_next: out: spin_unlock(tree-lock); - if (prealloc) - free_extent_state(prealloc); + free_extent_state(prealloc); return err; @@ -4146,8 +4143,7 @@ int extent_readpages(struct extent_io_tree *tree, __extent_readpages(tree, pagepool, nr, get_extent, em_cached, bio, 0, bio_flags, READ); - if (em_cached) - free_extent_map(em_cached); + free_extent_map(em_cached); BUG_ON(!list_empty(pages)); if (bio) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index a18ceab..add07ce8 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -677,10 +677,8 @@ next: /* once for the tree*/ free_extent_map(em); } - if (split) - free_extent_map(split); - if (split2) - free_extent_map(split2); + free_extent_map(split); + free_extent_map(split2); } /* diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 3384819..11883e2 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1943,8 +1943,7 @@ new_bitmap: out: if (info) { - if (info-bitmap) - kfree(info-bitmap); + kfree(info-bitmap); kmem_cache_free(btrfs_free_space_cachep, info); } @@ -3322,8 +3321,8 @@ again: if (info) kmem_cache_free(btrfs_free_space_cachep, info); - if (map) - kfree(map); + + kfree(map); return 0; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d23362f..7301b99 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -857,8 +857,7 @@ static u64 get_extent_allocation_hint(struct inode *inode, u64 start, em = search_extent_mapping(em_tree, 0, 0); if (em em-block_start EXTENT_MAP_LAST_BYTE) alloc_hint = em-block_start; - if (em) - free_extent_map(em); + free_extent_map(em); } else { alloc_hint = em-block_start; free_extent_map(em); @@ -6573,8 +6572,7 @@ out: trace_btrfs_get_extent(root, em); - if (path) - btrfs_free_path(path); + btrfs_free_path(path); if (trans) { ret = btrfs_end_transaction(trans, root); if (!err) diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index b63ae20..ec8eb49 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -731,8 +731,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, else if (eb) __readahead_hook(fs_info-extent_root, eb, eb-start, ret); - if (eb) - free_extent_buffer(eb); + free_extent_buffer(eb); return 1; diff