Re: Sparse Warnings about locks in extent-tree.c
On Mon, Sep 22, 2014 at 09:57:35PM -0400, nick wrote: Hello Btfs Developers, I am new so am unsure of how to fix this but we are hitting some sparse warnings about unlock/lock is having a wrong count when exiting certain functions in extent-tree.c. I will paste the warnings below for you guys. Hope this is of help. Cheers, Nick fs/btrfs/extent-tree.c:6386:39: warning: context imbalance in 'btrfs_lock_cluster' - wrong count at exit fs/btrfs/extent-tree.c:6663:44: warning: context imbalance in 'find_free_extent' - unexpected unlock fs/btrfs/extent-tree.c:8810:9: warning: context imbalance in 'btrfs_put_block_group_cache' - wrong count at exit Hi Nick, What program is producing these warning? It looks like a static code analysis tool or something. thanks, -liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Corrupted metadata. 'btrfs check' fails repair due to an assertion failure: '!(root-ref_cows trans-transid != root-last_trans)'
I forgot the attachment. Here it is. # btrfs-debug-tree -b 13415158087680 /dev/sdh leaf 13415158087680 items 90 free space 8577 generation 140065 owner 5007 fs uuid 20ccaf09-54ea-486e-9495-9dc91b933e9c chunk uuid f5a0dea1-b250-4ea4-bf7f-50d30401a708 item 0 key (7516 DIR_INDEX 102) itemoff 16232 itemsize 51 location key (7617 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082290.dng_original item 1 key (7516 DIR_INDEX 103) itemoff 16190 itemsize 42 location key (7618 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082293.dng item 2 key (7516 DIR_INDEX 104) itemoff 16139 itemsize 51 location key (7619 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082293.dng_original item 3 key (7516 DIR_INDEX 105) itemoff 16097 itemsize 42 location key (7620 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082295.dng item 4 key (7516 DIR_INDEX 106) itemoff 16046 itemsize 51 location key (7621 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082295.dng_original item 5 key (7516 DIR_INDEX 107) itemoff 16004 itemsize 42 location key (7622 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082299.dng item 6 key (7516 DIR_INDEX 108) itemoff 15953 itemsize 51 location key (7623 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082299.dng_original item 7 key (7516 DIR_INDEX 536871021) itemoff 15911 itemsize 42 location key (7624 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082321.dng item 8 key (7516 DIR_INDEX 110) itemoff 15860 itemsize 51 location key (7625 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082321.dng_original item 9 key (7516 DIR_INDEX 111) itemoff 15818 itemsize 42 location key (7626 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082328.dng item 10 key (7516 DIR_INDEX 112) itemoff 15767 itemsize 51 location key (7627 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082328.dng_original item 11 key (7516 DIR_INDEX 113) itemoff 15725 itemsize 42 location key (7628 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082349.dng item 12 key (7516 DIR_INDEX 114) itemoff 15674 itemsize 51 location key (7629 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082349.dng_original item 13 key (7516 DIR_INDEX 115) itemoff 15632 itemsize 42 location key (7630 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082350.dng item 14 key (7516 DIR_INDEX 116) itemoff 15581 itemsize 51 location key (7631 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082350.dng_original item 15 key (7516 DIR_INDEX 117) itemoff 15539 itemsize 42 location key (7632 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082351.dng item 16 key (7516 DIR_INDEX 118) itemoff 15488 itemsize 51 location key (7633 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082351.dng_original item 17 key (7516 DIR_INDEX 119) itemoff 15446 itemsize 42 location key (7634 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082356.dng item 18 key (7516 DIR_INDEX 120) itemoff 15395 itemsize 51 location key (7635 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082356.dng_original item 19 key (7516 DIR_INDEX 121) itemoff 15353 itemsize 42 location key (7636 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082357.dng item 20 key (7516 DIR_INDEX 122) itemoff 15302 itemsize 51 location key (7637 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082357.dng_original item 21 key (7516 DIR_INDEX 123) itemoff 15260 itemsize 42 location key (7638 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082358.dng item 22 key (7516 DIR_INDEX 124) itemoff 15209 itemsize 51 location key (7639 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082358.dng_original item 23 key (7516 DIR_INDEX 125) itemoff 15167 itemsize 42 location key (7640 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: P4082359.dng item 24 key (7516 DIR_INDEX 126) itemoff 15116 itemsize 51 location key (7641 INODE_ITEM 0) type FILE namelen 21 datalen 0 name: P4082359.dng_original item 25 key (7516 DIR_INDEX 127) itemoff 15074 itemsize 42
Re: [PATCH v4] btrfs-progs: fix page align issue for lzo compress in restore
On Tue, 2014-09-23 at 10:25 +0800, Gui Hecheng wrote: When runing restore under lzo compression, bad compress length problems are encountered. It is because there is a page align problem with the @decompress_lzo, as follows: |--| ||-| |--|...|--| page ^page page | 3 bytes left When lzo compress pages im RAM, lzo will ensure that the 4 bytes len will be in one page as a whole. There is a situation that 3 (or less) bytes are left at the end of a page, and then the 4 bytes len is stored at the start of the next page. But the @decompress_lzo doesn't goto the start of the next page and continue to read the next 4 bytes which is across two pages, so a random value is fetched as a bad compress length. So we check page alignment every time before we are going to fetch the next @len and after the former piece of data is decompressed. If the current page that we reach has less than 4 bytes left, then we should fetch the next @len at the start of next page. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com Reviewed-by: Marc Dietrich marvi...@gmx.de --- changelog v1-v2: adopt alignment check method suggested by Marc v2-v3: make code more readable v3-v4: keep type safety --- cmds-restore.c | 29 +++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/cmds-restore.c b/cmds-restore.c index 38a131e..fa5d5d1 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -56,7 +56,10 @@ static int get_xattrs = 0; static int dry_run = 0; #define LZO_LEN 4 -#define PAGE_CACHE_SIZE 4096 +#define PAGE_CACHE_SIZE 4096UL +#define PAGE_CACHE_MASK (~(PAGE_CACHE_SIZE - 1)) +#define PAGE_CACHE_ALIGN(addr) (((addr) + PAGE_CACHE_SIZE - 1) \ + PAGE_CACHE_MASK) #define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3) static int decompress_zlib(char *inbuf, char *outbuf, u64 compress_len, @@ -93,6 +96,28 @@ static inline size_t read_compress_length(unsigned char *buf) return le32_to_cpu(dlen); } +static void align_if_need(size_t *tot_in, size_t *in_len) +{ + size_t tot_in_aligned; + size_t bytes_left; + + tot_in_aligned = PAGE_CACHE_ALIGN(*tot_in); + bytes_left = tot_in_aligned - *tot_in; + + if (bytes_left = LZO_LEN) + return; + + /* + * The LZO_LEN bytes is guaranteed to be + * in one page as a whole, so if a page + * has fewer than LZO_LEN bytes left, + * the LZO_LEN bytes should be fetched + * at the start of the next page + */ + *in_len += bytes_left; + *tot_in = tot_in_aligned; +} + static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 compress_len, u64 *decompress_len) { @@ -135,8 +160,8 @@ static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 compress_len, } out_len += new_len; outbuf += new_len; + align_if_need(tot_in, in_len); inbuf += in_len; - tot_in += in_len; } *decompress_len = out_len; Sorry, please scratch this one, the comments should be reformated. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] btrfs-progs: fix page align issue for lzo compress in restore
When runing restore under lzo compression, bad compress length problems are encountered. It is because there is a page align problem with the @decompress_lzo, as follows: |--| ||-| |--|...|--| page ^page page | 3 bytes left When lzo compress pages im RAM, lzo will ensure that the 4 bytes len will be in one page as a whole. There is a situation that 3 (or less) bytes are left at the end of a page, and then the 4 bytes len is stored at the start of the next page. But the @decompress_lzo doesn't goto the start of the next page and continue to read the next 4 bytes which is across two pages, so a random value is fetched as a bad compress length. So we check page alignment every time before we are going to fetch the next @len and after the former piece of data is decompressed. If the current page that we reach has less than 4 bytes left, then we should fetch the next @len at the start of next page. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com Reviewed-by: Marc Dietrich marvi...@gmx.de --- changelog v1-v2: adopt alignment check method suggested by Marc v2-v3: make code more readable v3-v4: keep type safety reformat comments --- cmds-restore.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/cmds-restore.c b/cmds-restore.c index e09acc4..1fe2df0 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -56,7 +56,10 @@ static int get_xattrs = 0; static int dry_run = 0; #define LZO_LEN 4 -#define PAGE_CACHE_SIZE 4096 +#define PAGE_CACHE_SIZE 4096UL +#define PAGE_CACHE_MASK (~(PAGE_CACHE_SIZE - 1)) +#define PAGE_CACHE_ALIGN(addr) (((addr) + PAGE_CACHE_SIZE - 1) \ +PAGE_CACHE_MASK) #define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3) static int decompress_zlib(char *inbuf, char *outbuf, u64 compress_len, @@ -93,6 +96,26 @@ static inline size_t read_compress_length(unsigned char *buf) return le32_to_cpu(dlen); } +static void align_if_need(size_t *tot_in, size_t *in_len) +{ + size_t tot_in_aligned; + size_t bytes_left; + + tot_in_aligned = PAGE_CACHE_ALIGN(*tot_in); + bytes_left = tot_in_aligned - *tot_in; + + if (bytes_left = LZO_LEN) + return; + + /* +* The LZO_LEN bytes is guaranteed to be in one page as a whole, +* so if a page has fewer than LZO_LEN bytes left, the LZO_LEN bytes +* should be fetched at the start of the next page +*/ + *in_len += bytes_left; + *tot_in = tot_in_aligned; +} + static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 compress_len, u64 *decompress_len) { @@ -135,8 +158,8 @@ static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 compress_len, } out_len += new_len; outbuf += new_len; + align_if_need(tot_in, in_len); inbuf += in_len; - tot_in += in_len; } *decompress_len = out_len; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 11/12] security, crypto: LLVMLinux: Remove VLAIS from ima_crypto.c
On 23/09/14 07:42, beh...@converseincode.com wrote: From: Behan Webster beh...@converseincode.com Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99 compliant equivalent. This patch allocates the appropriate amount of memory using a char array using the SHASH_DESC_ON_STACK macro. The new code can be compiled with both gcc and clang. Signed-off-by: Behan Webster beh...@converseincode.com Reviewed-by: Mark Charlebois charl...@gmail.com Reviewed-by: Jan-Simon Möller dl...@gmx.de Acked-by: Herbert Xu herb...@gondor.apana.org.au Cc: t...@linutronix.de Looks good. Thanks. Acked-by: Dmitry Kasatkin d.kasat...@samsung.com --- security/integrity/ima/ima_crypto.c | 47 +++-- 1 file changed, 19 insertions(+), 28 deletions(-) diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c index 0bd7328..e35f5d9 100644 --- a/security/integrity/ima/ima_crypto.c +++ b/security/integrity/ima/ima_crypto.c @@ -380,17 +380,14 @@ static int ima_calc_file_hash_tfm(struct file *file, loff_t i_size, offset = 0; char *rbuf; int rc, read = 0; - struct { - struct shash_desc shash; - char ctx[crypto_shash_descsize(tfm)]; - } desc; + SHASH_DESC_ON_STACK(shash, tfm); - desc.shash.tfm = tfm; - desc.shash.flags = 0; + shash-tfm = tfm; + shash-flags = 0; hash-length = crypto_shash_digestsize(tfm); - rc = crypto_shash_init(desc.shash); + rc = crypto_shash_init(shash); if (rc != 0) return rc; @@ -420,7 +417,7 @@ static int ima_calc_file_hash_tfm(struct file *file, break; offset += rbuf_len; - rc = crypto_shash_update(desc.shash, rbuf, rbuf_len); + rc = crypto_shash_update(shash, rbuf, rbuf_len); if (rc) break; } @@ -429,7 +426,7 @@ static int ima_calc_file_hash_tfm(struct file *file, kfree(rbuf); out: if (!rc) - rc = crypto_shash_final(desc.shash, hash-digest); + rc = crypto_shash_final(shash, hash-digest); return rc; } @@ -487,18 +484,15 @@ static int ima_calc_field_array_hash_tfm(struct ima_field_data *field_data, struct ima_digest_data *hash, struct crypto_shash *tfm) { - struct { - struct shash_desc shash; - char ctx[crypto_shash_descsize(tfm)]; - } desc; + SHASH_DESC_ON_STACK(shash, tfm); int rc, i; - desc.shash.tfm = tfm; - desc.shash.flags = 0; + shash-tfm = tfm; + shash-flags = 0; hash-length = crypto_shash_digestsize(tfm); - rc = crypto_shash_init(desc.shash); + rc = crypto_shash_init(shash); if (rc != 0) return rc; @@ -508,7 +502,7 @@ static int ima_calc_field_array_hash_tfm(struct ima_field_data *field_data, u32 datalen = field_data[i].len; if (strcmp(td-name, IMA_TEMPLATE_IMA_NAME) != 0) { - rc = crypto_shash_update(desc.shash, + rc = crypto_shash_update(shash, (const u8 *) field_data[i].len, sizeof(field_data[i].len)); if (rc) @@ -518,13 +512,13 @@ static int ima_calc_field_array_hash_tfm(struct ima_field_data *field_data, data_to_hash = buffer; datalen = IMA_EVENT_NAME_LEN_MAX + 1; } - rc = crypto_shash_update(desc.shash, data_to_hash, datalen); + rc = crypto_shash_update(shash, data_to_hash, datalen); if (rc) break; } if (!rc) - rc = crypto_shash_final(desc.shash, hash-digest); + rc = crypto_shash_final(shash, hash-digest); return rc; } @@ -565,15 +559,12 @@ static int __init ima_calc_boot_aggregate_tfm(char *digest, { u8 pcr_i[TPM_DIGEST_SIZE]; int rc, i; - struct { - struct shash_desc shash; - char ctx[crypto_shash_descsize(tfm)]; - } desc; + SHASH_DESC_ON_STACK(shash, tfm); - desc.shash.tfm = tfm; - desc.shash.flags = 0; + shash-tfm = tfm; + shash-flags = 0; - rc = crypto_shash_init(desc.shash); + rc = crypto_shash_init(shash); if (rc != 0) return rc; @@ -581,10 +572,10 @@ static int __init ima_calc_boot_aggregate_tfm(char *digest, for (i = TPM_PCR0; i TPM_PCR8; i++) { ima_pcrread(i, pcr_i); /* now accumulate with current aggregate */ - rc = crypto_shash_update(desc.shash, pcr_i, TPM_DIGEST_SIZE); + rc = crypto_shash_update(shash, pcr_i, TPM_DIGEST_SIZE); }
Re: [PATCH 1/2] Btrfs: add missing end_page_writeback on submit_extent_page failure
On Mon, Sep 22, 2014 at 05:41:04PM +0100, Filipe Manana wrote: If submit_extent_page() fails in write_one_eb(), we end up with the current page not marked dirty anymore, unlocked and marked for writeback. But we never end up calling end_page_writeback() against the page, which will make calls to filemap_fdatawait_range (e.g. at transaction commit time) hang forever waiting for the writeback bit to be cleared from the page. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3af4966..91f866c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3668,6 +3668,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, if (ret) { set_bit(EXTENT_BUFFER_IOERR, eb-bflags); SetPageError(p); + end_page_writeback(p); This is not always true, and it depends on the place where we get error in submit_extent_page(), whether it has built and submitted a bio, if it's true, case can be different as bio_endio will be called, and calling end_page_writeback() again will end up with panic. thanks, -liubo if (atomic_sub_and_test(num_pages - i, eb-io_pages)) end_extent_buffer_writeback(eb); ret = -EIO; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
On 2014-09-22 16:51, Stefan G. Weichinger wrote: Am 20.09.2014 um 11:32 schrieb Duncan: What I do as part of my regular backup regime, is every few kernel cycles I wipe the (first level) backup and do a fresh mkfs.btrfs, activating new optional features as I believe appropriate. Then I boot to the new backup and run a bit to test it, then wipe the normal working copy and do a fresh mkfs.btrfs on it, again with the new optional features enabled that I want. Is re-creating btrfs-filesystems *recommended* in any way? Does that actually make a difference in the fs-structure? I would recommend it, there are some newer features that you can only set at mkfs time. Quite often, when a new feature is implemented, it is some time before things are such that it can be enabled online, and even then that doesn't convert anything until it is rewritten. So far I assumed it was enough to keep the kernel up2date, use current (stable) btrfs-progs and run some scrub every week or so (not to mention backups .. if it ain't backed up, it was/isn't important). Stefan smime.p7s Description: S/MIME Cryptographic Signature
mount problem
Hi all, we're testing BTRFS on our Debian server. After a lot of operations simulating a RAID1 failure, every time I mount my BTRFS RAID1 volume the kernel logs these messages: [73894.436173] BTRFS: bdev /dev/etherd/e30.20 errs: wr 33036, rd 0, flush 0, corrupt 2806, gen 0 [73894.436181] BTRFS: bdev /dev/etherd/e60.28 errs: wr 244165, rd 0, flush 0, corrupt 1, gen 4 Everything seems to work nice but I'm courious to know what these messages mean (in particular what do gen and corrupt mean?). # uname -a Linux dub 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64 GNU/Linux # btrfs --version Btrfs v3.16 # btrfs fi show Label: 'btrfs_multiappliance' uuid: 3452ffdd-c09b-43dd-9adb-cffde8518a72 Total devices 2 FS bytes used 20.03GiB devid1 size 1.82TiB used 24.03GiB path /dev/etherd/e30.20 devid2 size 3.64TiB used 24.03GiB path /dev/etherd/e60.28 # btrfs fi df /media/multiapp Data, RAID1: total=22.00GiB, used=20.01GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=2.00GiB, used=21.23MiB unknown, single: total=16.00MiB, used=0.00 # dmesg [82932.655078] BTRFS info (device etherd/e30.20): disk space caching is enabled [82932.678380] BTRFS: bdev /dev/etherd/e30.20 errs: wr 33036, rd 0, flush 0, corrupt 2806, gen 0 [82932.678388] BTRFS: bdev /dev/etherd/e60.28 errs: wr 244165, rd 0, flush 0, corrupt 1, gen 4 -- Thanks in advance, Simone Ferretti -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.
On 09/23/2014 01:40 AM, Qu Wenruo wrote: [BUG] Originally when mount btrfs with -o subvol= mount option, btrfs will lose all security lable. And if the btrfs fs is mounted somewhere else, due to the lost of security lable, SELinux will refuse to mount since the same super block is being mounted using different security lable. [REPRODUCER] With SELinux enabled: #mkfs -t btrfs /dev/sda5 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs #btrfs subvolume create /mnt/btrfs/subvol #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test kernel message: SELinux: mount invalid. Same superblock, different security settings for (dev sda5, type btrfs) [REASON] This happens because btrfs will call vfs_kern_mount() and then mount_subtree() to handle subvolume name lookup. First mount will cut off all the security lables and when it comes to the second vfs_kern_mount(), it has no security label now. [FIX] This patch will makes btrfs behavior much more like nfs, which has the type flag FS_BINARY_MOUNTDATA, making btrfs handles the security label internally. So security label will be set in the real mount time and won't lose label when use with subvol= mount option. Thanks for working on this. Eric Sandeen (cc'd) was trying out something similar recently, so I want to make sure this doesn't conflict with his ideas. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs: add test regarding clearing compression flag/property
On Thu, 11 Sep 2014 11:45:34 +0100, Filipe Manana wrote: Regression test for btrfs where removing the flag FS_COMPR_FL (chattr -c) from an inode wouldn't clear its compression property. This was fixed in the following linux kernel patch: Btrfs: add missing compression property remove in btrfs_ioctl_setflags Signed-off-by: Filipe Manana fdman...@suse.com I would have expected this to fail with MOUNT_OPTIONS=-o compress=no, but it looks as though Btrfs stores the attributes unconditionally. Either way this change looks good to me. Reviewed-by: David Disseldorp dd...@suse.de Cheers, David -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs send does not work from readonly device
Let's say we have a snapshot called snapshot on our device /dev/sdb. Now we boot a qemu machine and attatch the disk read only (since we do not want to put our data at risk in the vm, and read only should be enough to get our snapshot data). I used qemu-system-x86_64 -drive -file=/dev/sdb,if=scsi,readonly -hda path_to_guest_operating_system When we are in the qemu instance (suppose our device /dev/sdb is mounted on /mnt/import), we want to import our data now using btrfs send/receive: btrfs send /mnt/import/snapshot | btrfs receive / we get the following problems in syslog due to the fact that our device is read only: delayed_refs has NO entry BTRFS warning (device sdb): btrfs_update_root:151: Aborting unsused transaction(No space left) BTRFS warning (device sdb): Skipping commit of aborted transaction. BTRFS warning (device sdb): cleanup_transaction:1547: Aborting unused transaction (No space left) Apart from the snapshot folder being created in / nothing happens, the process simply hangs forever, with no actual data getting imported. Is that supposed to be that way? Why is readonly not enough to import data using btrfs send? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: add missing end_page_writeback on submit_extent_page failure
On Tue, Sep 23, 2014 at 11:14 AM, Liu Bo bo.li@oracle.com wrote: On Mon, Sep 22, 2014 at 05:41:04PM +0100, Filipe Manana wrote: If submit_extent_page() fails in write_one_eb(), we end up with the current page not marked dirty anymore, unlocked and marked for writeback. But we never end up calling end_page_writeback() against the page, which will make calls to filemap_fdatawait_range (e.g. at transaction commit time) hang forever waiting for the writeback bit to be cleared from the page. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3af4966..91f866c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3668,6 +3668,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, if (ret) { set_bit(EXTENT_BUFFER_IOERR, eb-bflags); SetPageError(p); + end_page_writeback(p); This is not always true, and it depends on the place where we get error in submit_extent_page(), whether it has built and submitted a bio, if it's true, case can be different as bio_endio will be called, and calling end_page_writeback() again will end up with panic. No, it's always true when the caller is write_one_eb(). If submit_extent_page() returns an error, we're sure that the page wasn't added to a bio, because bio_ret is always not NULL (but *bio_ret can be). Also, if we call submit_one_bio() in submit_extent_page() (first if statement) , we know that the bio doesn't contain the page. thanks thanks, -liubo if (atomic_sub_and_test(num_pages - i, eb-io_pages)) end_extent_buffer_writeback(eb); ret = -EIO; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
Am 23.09.2014 um 14:08 schrieb Austin S Hemmelgarn: On 2014-09-22 16:51, Stefan G. Weichinger wrote: Is re-creating btrfs-filesystems *recommended* in any way? Does that actually make a difference in the fs-structure? I would recommend it, there are some newer features that you can only set at mkfs time. Quite often, when a new feature is implemented, it is some time before things are such that it can be enabled online, and even then that doesn't convert anything until it is rewritten. What features for example? I created my main btrfs a few months ago and would like to avoid recreating it as this would mean restoring my root-fs on my main workstation. Although I would do it if it is worth it ;-) I assume I could read some kind of version number out of the superblock or so? btrfs-show-super ? S -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
On 2014-09-23 09:06, Stefan G. Weichinger wrote: Am 23.09.2014 um 14:08 schrieb Austin S Hemmelgarn: On 2014-09-22 16:51, Stefan G. Weichinger wrote: Is re-creating btrfs-filesystems *recommended* in any way? Does that actually make a difference in the fs-structure? I would recommend it, there are some newer features that you can only set at mkfs time. Quite often, when a new feature is implemented, it is some time before things are such that it can be enabled online, and even then that doesn't convert anything until it is rewritten. What features for example? Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: mixed-bg- mixed data and metadata block groups extref - increased hard-link limit per file to 65536 raid56 - raid56 extended format skinny-metadata - reduced size metadata extent refs no-holes- no explicit hole extents for files mixed-bg is something that you generally wouldn't want to change after mkfs. extref can be enabled online, and the filesystem metadata gets updated as-needed, and dosen't provide any real performance improvement (but is needed for some mail servers that have HUGE mail-queues) I don't know anything about the raid56 option, but there isn't any way to change it after mkfs. skinyy-metadata can be changed online, and the format gets updated on rewrite of each metadata block. This one does provide a performance improvement (stat() in particular runs noticeably faster). You should probably enable this if it isn't already enabled, even if you don't recreate your filesystem. no-holes cannot currently be changed online, and is a very recent addition (post v3.14 btrfs-progs I believe) that provides improved performance for sparse files (which is particularly useful if you are doing things with fixed size virtual machine disk images). It's this last one that prompted me personally to recreate my filesystems most recently, as I use sparse files to save space as much as possible. I created my main btrfs a few months ago and would like to avoid recreating it as this would mean restoring my root-fs on my main workstation. Although I would do it if it is worth it ;-) I assume I could read some kind of version number out of the superblock or so? btrfs-show-super ? AFAIK there isn't really any 'version number' that has any meaning in the superblock (except for telling the kernel that it uses the stable disk layout), however, there are flag bits that you can look for (compat_flags, compat_ro_flags, and incompat_flags). I'm not 100% certain what each bit means, but on my system with a only 1 month old BTRFS filesystem, with extref, skinny-metadata, and no-holes turned on, i have compat_flags: 0x0, compat_ro_flags: 0x0, and incompat_flags: 0x16b. The other potentially significant thing is that the default nodesize/leafsize has changed recently from 4096 to 16384, as that gives somewhat better performance for most use cases. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 1/2] Btrfs: add missing end_page_writeback on submit_extent_page failure
On Tue, Sep 23, 2014 at 02:03:07PM +0100, Filipe David Manana wrote: On Tue, Sep 23, 2014 at 11:14 AM, Liu Bo bo.li@oracle.com wrote: On Mon, Sep 22, 2014 at 05:41:04PM +0100, Filipe Manana wrote: If submit_extent_page() fails in write_one_eb(), we end up with the current page not marked dirty anymore, unlocked and marked for writeback. But we never end up calling end_page_writeback() against the page, which will make calls to filemap_fdatawait_range (e.g. at transaction commit time) hang forever waiting for the writeback bit to be cleared from the page. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3af4966..91f866c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3668,6 +3668,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, if (ret) { set_bit(EXTENT_BUFFER_IOERR, eb-bflags); SetPageError(p); + end_page_writeback(p); This is not always true, and it depends on the place where we get error in submit_extent_page(), whether it has built and submitted a bio, if it's true, case can be different as bio_endio will be called, and calling end_page_writeback() again will end up with panic. No, it's always true when the caller is write_one_eb(). If submit_extent_page() returns an error, we're sure that the page wasn't added to a bio, because bio_ret is always not NULL (but *bio_ret can be). Also, if we call submit_one_bio() in submit_extent_page() (first if statement) , we know that the bio doesn't contain the page. I see, you're right. Reviewed-by: Liu Bo bo.li@oracle.com thanks, -liubo thanks thanks, -liubo if (atomic_sub_and_test(num_pages - i, eb-io_pages)) end_extent_buffer_writeback(eb); ret = -EIO; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
Am 23.09.2014 um 15:38 schrieb Austin S Hemmelgarn: On 2014-09-23 09:06, Stefan G. Weichinger wrote: What features for example? Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: mixed-bg- mixed data and metadata block groups extref- increased hard-link limit per file to 65536 raid56- raid56 extended format skinny-metadata- reduced size metadata extent refs no-holes- no explicit hole extents for files mixed-bg is something that you generally wouldn't want to change after mkfs. extref can be enabled online, and the filesystem metadata gets updated as-needed, and dosen't provide any real performance improvement (but is needed for some mail servers that have HUGE mail-queues) ok, not needed here I don't know anything about the raid56 option, but there isn't any way to change it after mkfs. not needed in my systems. skinyy-metadata can be changed online, and the format gets updated on rewrite of each metadata block. This one does provide a performance improvement (stat() in particular runs noticeably faster). You should probably enable this if it isn't already enabled, even if you don't recreate your filesystem. So this is done via btrfstune, right? I will give that a try, for my rootfs it doesn't allow me right now as it is obviously mounted (live-cd, right?). no-holes cannot currently be changed online, and is a very recent addition (post v3.14 btrfs-progs I believe) that provides improved performance for sparse files (which is particularly useful if you are doing things with fixed size virtual machine disk images). Yes, I have some of those! AFAIK there isn't really any 'version number' that has any meaning in the superblock (except for telling the kernel that it uses the stable disk layout), however, there are flag bits that you can look for (compat_flags, compat_ro_flags, and incompat_flags). I'm not 100% certain what each bit means, but on my system with a only 1 month old BTRFS filesystem, with extref, skinny-metadata, and no-holes turned on, i have compat_flags: 0x0, compat_ro_flags: 0x0, and incompat_flags: 0x16b. The other potentially significant thing is that the default nodesize/leafsize has changed recently from 4096 to 16384, as that gives somewhat better performance for most use cases. I have the 16k for both already. Thanks for your explanations, I will dig into it as soon as I find the time. Seems I have to backup/restore quite some stuff ;-) Stefan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Sparse Warnings about locks in extent-tree.c
On Tue, Sep 23, 2014 at 02:45:44PM +0800, Liu Bo wrote: On Mon, Sep 22, 2014 at 09:57:35PM -0400, nick wrote: Hello Btfs Developers, I am new so am unsure of how to fix this but we are hitting some sparse warnings about unlock/lock is having a wrong count when exiting certain functions in extent-tree.c. I will paste the warnings below for you guys. Hope this is of help. Cheers, Nick fs/btrfs/extent-tree.c:6386:39: warning: context imbalance in 'btrfs_lock_cluster' - wrong count at exit fs/btrfs/extent-tree.c:6663:44: warning: context imbalance in 'find_free_extent' - unexpected unlock fs/btrfs/extent-tree.c:8810:9: warning: context imbalance in 'btrfs_put_block_group_cache' - wrong count at exit What program is producing these warning? It looks like a static code analysis tool or something. https://sparse.wiki.kernel.org/index.php/Main_Page It understands a few semantics like the locking and reports when there's an unannotated function that has imbalanced locking, like the errors above. Any static analyzer has some ratio of false positives so not every report means there's a bug. One would expect that a function 'btrfs_lock_cluster' would act as a lock and leaves some locks held at exit. The fix is to annotate it a after review. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: be aware of btree inode write errors to avoid fs corruption
On Mon, Sep 22, 2014 at 05:41:05PM +0100, Filipe Manana wrote: While we have a transaction ongoing, the VM might decide at any time to call btree_inode-i_mapping-a_ops-writepages(), which will start writeback of dirty pages belonging to btree nodes/leafs. This call might return an error or the writeback might finish with an error before we attempt to commit the running transaction. If this happens, we might have no way of knowing that such error happened when we are committing the transaction - because the pages might no longer be marked dirty nor tagged for writeback (if a subsequent modification to the extent buffer didn't happen before the transaction commit) which makes filemap_fdata[write|wait]_range unable to find such pages (even if they're marked with SetPageError). So if this happens we must abort the transaction, otherwise we commit a super block with btree roots that point to btree nodes/leafs whose content on disk is invalid - either garbage or the content of some node/leaf from a past generation that got cowed or deleted and is no longer valid (for this later case we end up getting error messages like parent transid verify failed on 10826481664 wanted 25748 found 29562 when reading btree nodes/leafs from disk). Good catch! Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's i_mapping would not be enough because we need to distinguish between log tree extents (not fatal) vs non-log tree extents (fatal) and because the next call to filemap_fdatawait_range() will catch and clear such errors in the mapping - and that call might be from a log sync and not from a transaction commit, which means we would not know about the error at transaction commit time. Also, checking for the eb flag EXTENT_BUFFER_IOERR at transaction commit time isn't done and would not be completely reliable, as the eb might be removed from memory and read back when trying to get it, which clears that flag right before reading the eb's pages from disk, making us not know about the previous write error. Using the BTRFS_INODE_BTREE_IO_ERR and BTRFS_INODE_BTREE_LOG_IO_ERR inode flags also makes us achieve the goal of AS_EIO/AS_ENOSPC when writepages() returns success, started writeback for all dirty pages and before filemap_fdatawait_range() is called, the writeback for all dirty pages had already finished with errors - because we were not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return success, as it could not know that writeback errors happened (the pages were no longer tagged for writeback). But there is a problem when the extent buffer with IO error is COWed and deleted, but the BTRFS_INODE_BTREE_IO_ERR flag still remains in btree_inode's runtime_flags, we'd still run into cleanup_transaction() as this patch expects. So I think we need to remove that BTREE_IO_ERR flag in this case. Well, I notice that you don't clear rest pages' DIRTY in the error case, so it can lead to hitting the BUG_ON in btrfs_release_extent_buffer_page() when the eb with IO error is COWed by a new good eb and gets itself freed then. I'm making a patch to add missing clear_page_dirty_for_io(). thanks, -liubo Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/btrfs_inode.h | 2 ++ fs/btrfs/extent_io.c | 69 +++--- fs/btrfs/transaction.c | 20 --- fs/btrfs/transaction.h | 3 +-- fs/btrfs/tree-log.c| 13 ++ 5 files changed, 93 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 3511031..dbe37dc 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -44,6 +44,8 @@ #define BTRFS_INODE_IN_DELALLOC_LIST 9 #define BTRFS_INODE_READDIO_NEED_LOCK10 #define BTRFS_INODE_HAS_PROPS11 +#define BTRFS_INODE_BTREE_IO_ERR 12 +#define BTRFS_INODE_BTREE_LOG_IO_ERR 13 /* in memory btrfs inode */ struct btrfs_inode { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 91f866c..33b113b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -20,6 +20,7 @@ #include locking.h #include rcu-string.h #include backref.h +#include transaction.h static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -3606,6 +3607,68 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb) wake_up_bit(eb-bflags, EXTENT_BUFFER_WRITEBACK); } +static void set_btree_ioerr(struct page *page, int err) +{ + struct extent_buffer *eb = (struct extent_buffer *)page-private; + const u64 start = eb-start; + const u64 end = eb-start + eb-len - 1; + struct btrfs_fs_info *fs_info = eb-fs_info; + int ret; + + set_bit(EXTENT_BUFFER_IOERR, eb-bflags); + SetPageError(page); + + /* + * If writeback for a btree extent that doesn't belong to a log tree +
Re: Sparse Warnings about locks in extent-tree.c
On Tue, Sep 23, 2014 at 04:01:41PM +0200, David Sterba wrote: On Tue, Sep 23, 2014 at 02:45:44PM +0800, Liu Bo wrote: On Mon, Sep 22, 2014 at 09:57:35PM -0400, nick wrote: Hello Btfs Developers, I am new so am unsure of how to fix this but we are hitting some sparse warnings about unlock/lock is having a wrong count when exiting certain functions in extent-tree.c. I will paste the warnings below for you guys. Hope this is of help. Cheers, Nick fs/btrfs/extent-tree.c:6386:39: warning: context imbalance in 'btrfs_lock_cluster' - wrong count at exit fs/btrfs/extent-tree.c:6663:44: warning: context imbalance in 'find_free_extent' - unexpected unlock fs/btrfs/extent-tree.c:8810:9: warning: context imbalance in 'btrfs_put_block_group_cache' - wrong count at exit What program is producing these warning? It looks like a static code analysis tool or something. https://sparse.wiki.kernel.org/index.php/Main_Page It understands a few semantics like the locking and reports when there's an unannotated function that has imbalanced locking, like the errors above. Cool, thanks Dave. -liubo Any static analyzer has some ratio of false positives so not every report means there's a bug. One would expect that a function 'btrfs_lock_cluster' would act as a lock and leaves some locks held at exit. The fix is to annotate it a after review. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix crash of btrfs_release_extent_buffer_page
This is actually inspired by Filipe's patch. When write_one_eb() fails on submit_extent_page(), it'll give up writing this eb and mark it with EXTENT_BUFFER_IOERR. So if it's not the last page that encounter the failure, there are some left pages which remain DIRTY, and if a later COW on this eb happens, ie. eb is COWed and freed, it'd run into BUG_ON in btrfs_release_extent_buffer_page() for the DIRTY page, ie. BUG_ON(PageDirty(page)); This adds the missing clear_page_dirty_for_io() for the rest pages of eb. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index af0359d..e68a1d9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3597,6 +3597,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, if (unlikely(ret)) { for (; i num_pages; i++) { struct page *p = extent_buffer_page(eb, i); + clear_page_dirty_for_io(p); unlock_page(p); } } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
If it is unknown, which of these options have been used at btrfs creation time - is it possible to check the state of these options afterwards on a mounted or unmounted filesystem? 2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com: Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: mixed-bg- mixed data and metadata block groups extref - increased hard-link limit per file to 65536 raid56 - raid56 extended format skinny-metadata - reduced size metadata extent refs no-holes- no explicit hole extents for files -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: be aware of btree inode write errors to avoid fs corruption
On Tue, Sep 23, 2014 at 3:05 PM, Liu Bo bo.li@oracle.com wrote: On Mon, Sep 22, 2014 at 05:41:05PM +0100, Filipe Manana wrote: While we have a transaction ongoing, the VM might decide at any time to call btree_inode-i_mapping-a_ops-writepages(), which will start writeback of dirty pages belonging to btree nodes/leafs. This call might return an error or the writeback might finish with an error before we attempt to commit the running transaction. If this happens, we might have no way of knowing that such error happened when we are committing the transaction - because the pages might no longer be marked dirty nor tagged for writeback (if a subsequent modification to the extent buffer didn't happen before the transaction commit) which makes filemap_fdata[write|wait]_range unable to find such pages (even if they're marked with SetPageError). So if this happens we must abort the transaction, otherwise we commit a super block with btree roots that point to btree nodes/leafs whose content on disk is invalid - either garbage or the content of some node/leaf from a past generation that got cowed or deleted and is no longer valid (for this later case we end up getting error messages like parent transid verify failed on 10826481664 wanted 25748 found 29562 when reading btree nodes/leafs from disk). Good catch! Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's i_mapping would not be enough because we need to distinguish between log tree extents (not fatal) vs non-log tree extents (fatal) and because the next call to filemap_fdatawait_range() will catch and clear such errors in the mapping - and that call might be from a log sync and not from a transaction commit, which means we would not know about the error at transaction commit time. Also, checking for the eb flag EXTENT_BUFFER_IOERR at transaction commit time isn't done and would not be completely reliable, as the eb might be removed from memory and read back when trying to get it, which clears that flag right before reading the eb's pages from disk, making us not know about the previous write error. Using the BTRFS_INODE_BTREE_IO_ERR and BTRFS_INODE_BTREE_LOG_IO_ERR inode flags also makes us achieve the goal of AS_EIO/AS_ENOSPC when writepages() returns success, started writeback for all dirty pages and before filemap_fdatawait_range() is called, the writeback for all dirty pages had already finished with errors - because we were not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return success, as it could not know that writeback errors happened (the pages were no longer tagged for writeback). But there is a problem when the extent buffer with IO error is COWed and deleted, but the BTRFS_INODE_BTREE_IO_ERR flag still remains in btree_inode's runtime_flags, we'd still run into cleanup_transaction() as this patch expects. So I think we need to remove that BTREE_IO_ERR flag in this case. Thought about that, no good reason to not do it (yet at least). I'll think a bit about it and send a v2 if I can't find a reason. Well, I notice that you don't clear rest pages' DIRTY in the error case, Where exactly? so it can lead to hitting the BUG_ON in btrfs_release_extent_buffer_page() when the eb with IO error is COWed by a new good eb and gets itself freed then. I'm making a patch to add missing clear_page_dirty_for_io(). clear_page_dirty_for_io() is already called by write_one_eb(), and immediately after it tags the page(s) for writeback. If error happens during writeback, end_bio_extent_buffer_writepage() removes the writeback tag (from page and eb's flags). So either I'm missing something, or you didn't notice all this. thanks thanks, -liubo Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/btrfs_inode.h | 2 ++ fs/btrfs/extent_io.c | 69 +++--- fs/btrfs/transaction.c | 20 --- fs/btrfs/transaction.h | 3 +-- fs/btrfs/tree-log.c| 13 ++ 5 files changed, 93 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 3511031..dbe37dc 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -44,6 +44,8 @@ #define BTRFS_INODE_IN_DELALLOC_LIST 9 #define BTRFS_INODE_READDIO_NEED_LOCK10 #define BTRFS_INODE_HAS_PROPS11 +#define BTRFS_INODE_BTREE_IO_ERR 12 +#define BTRFS_INODE_BTREE_LOG_IO_ERR 13 /* in memory btrfs inode */ struct btrfs_inode { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 91f866c..33b113b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -20,6 +20,7 @@ #include locking.h #include rcu-string.h #include backref.h +#include transaction.h static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -3606,6 +3607,68 @@ static void
Re: [PATCH] Btrfs: fix crash of btrfs_release_extent_buffer_page
On Tue, Sep 23, 2014 at 3:22 PM, Liu Bo bo.li@oracle.com wrote: This is actually inspired by Filipe's patch. When write_one_eb() fails on submit_extent_page(), it'll give up writing this eb and mark it with EXTENT_BUFFER_IOERR. So if it's not the last page that encounter the failure, there are some left pages which remain DIRTY, and if a later COW on this eb happens, ie. eb is COWed and freed, it'd run into BUG_ON in btrfs_release_extent_buffer_page() for the DIRTY page, ie. BUG_ON(PageDirty(page)); This adds the missing clear_page_dirty_for_io() for the rest pages of eb. Signed-off-by: Liu Bo bo.li@oracle.com Looks good. Thanks. Reviewed-by: Filipe Manana fdman...@suse.com --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index af0359d..e68a1d9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3597,6 +3597,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb, if (unlikely(ret)) { for (; i num_pages; i++) { struct page *p = extent_buffer_page(eb, i); + clear_page_dirty_for_io(p); unlock_page(p); } } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
On 2014-09-23 10:23, Tobias Holst wrote: If it is unknown, which of these options have been used at btrfs creation time - is it possible to check the state of these options afterwards on a mounted or unmounted filesystem? 2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com mailto:ahferro...@gmail.com: Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: mixed-bg- mixed data and metadata block groups extref - increased hard-link limit per file to 65536 raid56 - raid56 extended format skinny-metadata - reduced size metadata extent refs no-holes- no explicit hole extents for files I don't think there is a specific tool for doing this, but some of them do show up in dmesg, for example skinny-metadata shows up as a mention of the FS having skinny extents. smime.p7s Description: S/MIME Cryptographic Signature
Re: fixes for btrfs check --repair
Hi, On Sat, Aug 30, 2014 at 01:00:40PM -0300, Alexandre Oliva wrote: Here are the patches. thanks, I've put them into the queue (branch with other fsck fixes). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] btrfs-progs: Add simple python front end to the search ioctl
This is a starting point for a debugfs style python interface using the search ioctl. For now it can only do one thing, which is to print out all the extents in a file and calculate the compression ratio. Over time it will grow more features, especially for the kinds of things we might run btrfs-debug-tree to find out. Expect the usage and output to change dramatically over time (don't hard code to it). Signed-off-by: Chris Mason c...@fb.com --- btrfs-debugfs | 296 ++ 1 file changed, 296 insertions(+) create mode 100755 btrfs-debugfs diff --git a/btrfs-debugfs b/btrfs-debugfs new file mode 100755 index 000..cf1d285 --- /dev/null +++ b/btrfs-debugfs @@ -0,0 +1,296 @@ +#!/usr/bin/env python2 +# +# Simple python program to print out all the extents of a single file +# LGPLv2 license +# Copyright Facebook 2014 + +import sys,os,struct,fcntl,ctypes,stat + +# helpers for max ints +maxu64 = (1L 64) - 1 +maxu32 = (1L 32) - 1 + +# the inode (like form stat) +BTRFS_INODE_ITEM_KEY = 1 +# backref to the directory +BTRFS_INODE_REF_KEY = 12 +# backref to the directory v2 +BTRFS_INODE_EXTREF_KEY = 13 +# xattr items +BTRFS_XATTR_ITEM_KEY = 24 +# orphans for list files +BTRFS_ORPHAN_ITEM_KEY = 48 +# treelog items for dirs +BTRFS_DIR_LOG_ITEM_KEY = 60 +BTRFS_DIR_LOG_INDEX_KEY = 72 +# dir items and dir indexes both hold filenames +BTRFS_DIR_ITEM_KEY = 84 +BTRFS_DIR_INDEX_KEY = 96 +# these are the file extent pointers +BTRFS_EXTENT_DATA_KEY = 108 +# csums +BTRFS_EXTENT_CSUM_KEY = 128 +# root item for subvols and snapshots +BTRFS_ROOT_ITEM_KEY = 132 +# root item backrefs +BTRFS_ROOT_BACKREF_KEY = 144 +BTRFS_ROOT_REF_KEY = 156 +# each allocated extent has an extent item +BTRFS_EXTENT_ITEM_KEY = 168 +# optimized extents for metadata only +BTRFS_METADATA_ITEM_KEY = 169 +# backrefs for extents +BTRFS_TREE_BLOCK_REF_KEY = 176 +BTRFS_EXTENT_DATA_REF_KEY = 178 +BTRFS_EXTENT_REF_V0_KEY = 180 +BTRFS_SHARED_BLOCK_REF_KEY = 182 +BTRFS_SHARED_DATA_REF_KEY = 184 +# one of these for each block group +BTRFS_BLOCK_GROUP_ITEM_KEY = 192 +# dev extents records which part of each device is allocated +BTRFS_DEV_EXTENT_KEY = 204 +# dev items describe devs +BTRFS_DEV_ITEM_KEY = 216 +# one for each chunk +BTRFS_CHUNK_ITEM_KEY = 228 +# qgroup info +BTRFS_QGROUP_STATUS_KEY = 240 +BTRFS_QGROUP_INFO_KEY = 242 +BTRFS_QGROUP_LIMIT_KEY = 244 +BTRFS_QGROUP_RELATION_KEY = 246 +# records balance progress +BTRFS_BALANCE_ITEM_KEY = 248 +# stats on device errors +BTRFS_DEV_STATS_KEY = 249 +BTRFS_DEV_REPLACE_KEY = 250 +BTRFS_STRING_ITEM_KEY = 253 + +# in the kernel sources, this is flattened +# btrfs_ioctl_search_args_v2. It includes both the btrfs_ioctl_search_key +# and the buffer. We're using a 64K buffer size. +# +args_buffer_size = 65536 +class btrfs_ioctl_search_args(ctypes.Structure): +_pack_ = 1 +_fields_ = [ (tree_id, ctypes.c_ulonglong), + (min_objectid, ctypes.c_ulonglong), + (max_objectid, ctypes.c_ulonglong), + (min_offset, ctypes.c_ulonglong), + (max_offset, ctypes.c_ulonglong), + (min_transid, ctypes.c_ulonglong), + (max_transid, ctypes.c_ulonglong), + (min_type, ctypes.c_uint), + (max_type, ctypes.c_uint), + (nr_items, ctypes.c_uint), + (unused, ctypes.c_uint), + (unused1, ctypes.c_ulonglong), + (unused2, ctypes.c_ulonglong), + (unused3, ctypes.c_ulonglong), + (unused4, ctypes.c_ulonglong), + (buf_size, ctypes.c_ulonglong), + (buf, ctypes.c_ubyte * args_buffer_size), + ] + +# the search ioctl resturns one header for each item +# +class btrfs_ioctl_search_header(ctypes.Structure): +_pack_ = 1 +_fields_ = [ (transid, ctypes.c_ulonglong), + (objectid, ctypes.c_ulonglong), + (offset, ctypes.c_ulonglong), + (type, ctypes.c_uint), + (len, ctypes.c_uint), + ] + +# the type field in btrfs_file_extent_item +BTRFS_FILE_EXTENT_INLINE = 0 +BTRFS_FILE_EXTENT_REG = 1 +BTRFS_FILE_EXTENT_PREALLOC = 2 + +class btrfs_file_extent_item(ctypes.LittleEndianStructure): +_pack_ = 1 +_fields_ = [ (generation, ctypes.c_ulonglong), + (ram_bytes, ctypes.c_ulonglong), + (compression, ctypes.c_ubyte), + (encryption, ctypes.c_ubyte), + (other_encoding, ctypes.c_ubyte * 2), + (type, ctypes.c_ubyte), + (disk_bytenr, ctypes.c_ulonglong), + (disk_num_bytes, ctypes.c_ulonglong), + (offset, ctypes.c_ulonglong), + (num_bytes, ctypes.c_ulonglong), + ] + +class btrfs_ioctl_search(): +def __init__(self): +self.args = btrfs_ioctl_search_args() +self.args.tree_id = 0 +
Re: [PATCH RFC] btrfs-progs: Add simple python front end to the search ioctl
On Tue, Sep 23, 2014 at 10:39 AM, Chris Mason c...@fb.com wrote: This is a starting point for a debugfs style python interface using the search ioctl. For now it can only do one thing, which is to print out all the extents in a file and calculate the compression ratio. Over time it will grow more features, especially for the kinds of things we might run btrfs-debug-tree to find out. Expect the usage and output to change dramatically over time (don't hard code to it). Signed-off-by: Chris Mason c...@fb.com --- btrfs-debugfs | 296 ++ 1 file changed, 296 insertions(+) create mode 100755 btrfs-debugfs diff --git a/btrfs-debugfs b/btrfs-debugfs new file mode 100755 index 000..cf1d285 --- /dev/null +++ b/btrfs-debugfs @@ -0,0 +1,296 @@ +#!/usr/bin/env python2 +# +# Simple python program to print out all the extents of a single file +# LGPLv2 license +# Copyright Facebook 2014 + +import sys,os,struct,fcntl,ctypes,stat + +# helpers for max ints +maxu64 = (1L 64) - 1 +maxu32 = (1L 32) - 1 + +# the inode (like form stat) +BTRFS_INODE_ITEM_KEY = 1 +# backref to the directory +BTRFS_INODE_REF_KEY = 12 +# backref to the directory v2 +BTRFS_INODE_EXTREF_KEY = 13 +# xattr items +BTRFS_XATTR_ITEM_KEY = 24 +# orphans for list files +BTRFS_ORPHAN_ITEM_KEY = 48 +# treelog items for dirs +BTRFS_DIR_LOG_ITEM_KEY = 60 +BTRFS_DIR_LOG_INDEX_KEY = 72 +# dir items and dir indexes both hold filenames +BTRFS_DIR_ITEM_KEY = 84 +BTRFS_DIR_INDEX_KEY = 96 +# these are the file extent pointers +BTRFS_EXTENT_DATA_KEY = 108 +# csums +BTRFS_EXTENT_CSUM_KEY = 128 +# root item for subvols and snapshots +BTRFS_ROOT_ITEM_KEY = 132 +# root item backrefs +BTRFS_ROOT_BACKREF_KEY = 144 +BTRFS_ROOT_REF_KEY = 156 +# each allocated extent has an extent item +BTRFS_EXTENT_ITEM_KEY = 168 +# optimized extents for metadata only +BTRFS_METADATA_ITEM_KEY = 169 +# backrefs for extents +BTRFS_TREE_BLOCK_REF_KEY = 176 +BTRFS_EXTENT_DATA_REF_KEY = 178 +BTRFS_EXTENT_REF_V0_KEY = 180 +BTRFS_SHARED_BLOCK_REF_KEY = 182 +BTRFS_SHARED_DATA_REF_KEY = 184 +# one of these for each block group +BTRFS_BLOCK_GROUP_ITEM_KEY = 192 +# dev extents records which part of each device is allocated +BTRFS_DEV_EXTENT_KEY = 204 +# dev items describe devs +BTRFS_DEV_ITEM_KEY = 216 +# one for each chunk +BTRFS_CHUNK_ITEM_KEY = 228 +# qgroup info +BTRFS_QGROUP_STATUS_KEY = 240 +BTRFS_QGROUP_INFO_KEY = 242 +BTRFS_QGROUP_LIMIT_KEY = 244 +BTRFS_QGROUP_RELATION_KEY = 246 +# records balance progress +BTRFS_BALANCE_ITEM_KEY = 248 +# stats on device errors +BTRFS_DEV_STATS_KEY = 249 +BTRFS_DEV_REPLACE_KEY = 250 +BTRFS_STRING_ITEM_KEY = 253 + +# in the kernel sources, this is flattened +# btrfs_ioctl_search_args_v2. It includes both the btrfs_ioctl_search_key +# and the buffer. We're using a 64K buffer size. +# +args_buffer_size = 65536 +class btrfs_ioctl_search_args(ctypes.Structure): Put comments like these in triple-quoted strings just inside the class or function you're defining; this makes them accessible using the standard help() system: class foo(bar): In the kernel sources, this is +_pack_ = 1 +_fields_ = [ (tree_id, ctypes.c_ulonglong), + (min_objectid, ctypes.c_ulonglong), + (max_objectid, ctypes.c_ulonglong), + (min_offset, ctypes.c_ulonglong), + (max_offset, ctypes.c_ulonglong), + (min_transid, ctypes.c_ulonglong), + (max_transid, ctypes.c_ulonglong), + (min_type, ctypes.c_uint), + (max_type, ctypes.c_uint), + (nr_items, ctypes.c_uint), + (unused, ctypes.c_uint), + (unused1, ctypes.c_ulonglong), + (unused2, ctypes.c_ulonglong), + (unused3, ctypes.c_ulonglong), + (unused4, ctypes.c_ulonglong), + (buf_size, ctypes.c_ulonglong), + (buf, ctypes.c_ubyte * args_buffer_size), + ] + +# the search ioctl resturns one header for each item +class btrfs_ioctl_search_header(ctypes.Structure): +_pack_ = 1 +_fields_ = [ (transid, ctypes.c_ulonglong), + (objectid, ctypes.c_ulonglong), + (offset, ctypes.c_ulonglong), + (type, ctypes.c_uint), + (len, ctypes.c_uint), + ] + +# the type field in btrfs_file_extent_item +BTRFS_FILE_EXTENT_INLINE = 0 +BTRFS_FILE_EXTENT_REG = 1 +BTRFS_FILE_EXTENT_PREALLOC = 2 + +class btrfs_file_extent_item(ctypes.LittleEndianStructure): +_pack_ = 1 +_fields_ = [ (generation, ctypes.c_ulonglong), + (ram_bytes, ctypes.c_ulonglong), + (compression, ctypes.c_ubyte), + (encryption, ctypes.c_ubyte), + (other_encoding, ctypes.c_ubyte * 2),
Re: [PATCH RFC] btrfs-progs: Add simple python front end to the search ioctl
Damn you gmail... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] btrfs-progs: scan /proc/partitions not all of /dev with -d
Hi, all 5 patches will be in the next integration. I haven't tested them yet, seems it's a bit more important to make a more stable devel base for more updates you might want to send. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.
On 9/23/14 7:49 AM, Chris Mason wrote: On 09/23/2014 01:40 AM, Qu Wenruo wrote: [BUG] Originally when mount btrfs with -o subvol= mount option, btrfs will lose all security lable. And if the btrfs fs is mounted somewhere else, due to the lost of security lable, SELinux will refuse to mount since the same super block is being mounted using different security lable. [REPRODUCER] With SELinux enabled: #mkfs -t btrfs /dev/sda5 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs #btrfs subvolume create /mnt/btrfs/subvol #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test kernel message: SELinux: mount invalid. Same superblock, different security settings for (dev sda5, type btrfs) [REASON] This happens because btrfs will call vfs_kern_mount() and then mount_subtree() to handle subvolume name lookup. First mount will cut off all the security lables and when it comes to the second vfs_kern_mount(), it has no security label now. [FIX] This patch will makes btrfs behavior much more like nfs, which has the type flag FS_BINARY_MOUNTDATA, making btrfs handles the security label internally. So security label will be set in the real mount time and won't lose label when use with subvol= mount option. Thanks for working on this. Eric Sandeen (cc'd) was trying out something similar recently, so I want to make sure this doesn't conflict with his ideas. My ideas didn't get very far. ;) What I was after was a way for multiple subvolumes to have unique contexts. It looks like this might do the trick, as long as they are mounted on a unique mount point. Would this allow subvolume create to take a context, so that everything under /mnt/btrfs/subvol/ has a unique subvol-wide context? thanks, -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 v2] Btrfs: be aware of btree inode write errors to avoid fs corruption
While we have a transaction ongoing, the VM might decide at any time to call btree_inode-i_mapping-a_ops-writepages(), which will start writeback of dirty pages belonging to btree nodes/leafs. This call might return an error or the writeback might finish with an error before we attempt to commit the running transaction. If this happens, we might have no way of knowing that such error happened when we are committing the transaction - because the pages might no longer be marked dirty nor tagged for writeback (if a subsequent modification to the extent buffer didn't happen before the transaction commit) which makes filemap_fdata[write|wait]_range unable to find such pages (even if they're marked with SetPageError). So if this happens we must abort the transaction, otherwise we commit a super block with btree roots that point to btree nodes/leafs whose content on disk is invalid - either garbage or the content of some node/leaf from a past generation that got cowed or deleted and is no longer valid (for this later case we end up getting error messages like parent transid verify failed on 10826481664 wanted 25748 found 29562 when reading btree nodes/leafs from disk). Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's i_mapping would not be enough because we need to distinguish between log tree extents (not fatal) vs non-log tree extents (fatal) and because the next call to filemap_fdatawait_range() will catch and clear such errors in the mapping - and that call might be from a log sync and not from a transaction commit, which means we would not know about the error at transaction commit time. Also, checking for the eb flag EXTENT_BUFFER_IOERR at transaction commit time isn't done and would not be completely reliable, as the eb might be removed from memory and read back when trying to get it, which clears that flag right before reading the eb's pages from disk, making us not know about the previous write error. Using the new counters eb_write_errors and log_eb_write_errors in the transaction also makes us achieve the goal of AS_EIO/AS_ENOSPC when writepages() returns success, started writeback for all dirty pages and before filemap_fdatawait_range() is called, the writeback for all dirty pages had already finished with errors - because we were not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return success, as it could not know that writeback errors happened (the pages were no longer tagged for writeback). Signed-off-by: Filipe Manana fdman...@suse.com --- V2: If an extent buffer's write failed but it's also deleted from the tree before the transaction commits, don't abort the transaction with -EIO, since the unwritten node/leaf it represents can't be pointed to by any other node in a tree. fs/btrfs/disk-io.c | 4 +-- fs/btrfs/extent-tree.c | 8 ++ fs/btrfs/extent_io.c | 76 +- fs/btrfs/extent_io.h | 3 +- fs/btrfs/transaction.c | 22 +-- fs/btrfs/transaction.h | 5 ++-- fs/btrfs/tree-log.c| 13 + 7 files changed, 111 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 23393ec..8b54acf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -610,7 +610,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, goto err; eb-read_mirror = mirror; - if (test_bit(EXTENT_BUFFER_IOERR, eb-bflags)) { + if (test_bit(EXTENT_BUFFER_READ_ERR, eb-bflags)) { ret = -EIO; goto err; } @@ -683,7 +683,7 @@ static int btree_io_failed_hook(struct page *page, int failed_mirror) struct btrfs_root *root = BTRFS_I(page-mapping-host)-root; eb = (struct extent_buffer *)page-private; - set_bit(EXTENT_BUFFER_IOERR, eb-bflags); + set_bit(EXTENT_BUFFER_READ_ERR, eb-bflags); eb-read_mirror = failed_mirror; atomic_dec(eb-io_pages); if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags)) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ef0845d..608814b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6221,6 +6221,14 @@ out: * anymore. */ clear_bit(EXTENT_BUFFER_CORRUPT, buf-bflags); + /* +* The unwritten node/leaf (due to an IO error) isn't pointed to by any +* other node in a tree, so it's safe to forget about the write error +* and avoid a transaction abort. +*/ + if (test_and_clear_bit(EXTENT_BUFFER_WRITE_ERR, buf-bflags)) + atomic_dec(trans-transaction-eb_write_errors); + btrfs_put_block_group(cache); } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 91f866c..e21f200 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -20,6 +20,7 @@ #include locking.h #include rcu-string.h #include backref.h +#include transaction.h static struct kmem_cache
[PATCH 2/2 v3] Btrfs: be aware of btree inode write errors to avoid fs corruption
While we have a transaction ongoing, the VM might decide at any time to call btree_inode-i_mapping-a_ops-writepages(), which will start writeback of dirty pages belonging to btree nodes/leafs. This call might return an error or the writeback might finish with an error before we attempt to commit the running transaction. If this happens, we might have no way of knowing that such error happened when we are committing the transaction - because the pages might no longer be marked dirty nor tagged for writeback (if a subsequent modification to the extent buffer didn't happen before the transaction commit) which makes filemap_fdata[write|wait]_range unable to find such pages (even if they're marked with SetPageError). So if this happens we must abort the transaction, otherwise we commit a super block with btree roots that point to btree nodes/leafs whose content on disk is invalid - either garbage or the content of some node/leaf from a past generation that got cowed or deleted and is no longer valid (for this later case we end up getting error messages like parent transid verify failed on 10826481664 wanted 25748 found 29562 when reading btree nodes/leafs from disk). Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's i_mapping would not be enough because we need to distinguish between log tree extents (not fatal) vs non-log tree extents (fatal) and because the next call to filemap_fdatawait_range() will catch and clear such errors in the mapping - and that call might be from a log sync and not from a transaction commit, which means we would not know about the error at transaction commit time. Also, checking for the eb flag EXTENT_BUFFER_IOERR at transaction commit time isn't done and would not be completely reliable, as the eb might be removed from memory and read back when trying to get it, which clears that flag right before reading the eb's pages from disk, making us not know about the previous write error. Using the new counters eb_write_errors and log_eb_write_errors in the transaction also makes us achieve the goal of AS_EIO/AS_ENOSPC when writepages() returns success, started writeback for all dirty pages and before filemap_fdatawait_range() is called, the writeback for all dirty pages had already finished with errors - because we were not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return success, as it could not know that writeback errors happened (the pages were no longer tagged for writeback). Signed-off-by: Filipe Manana fdman...@suse.com --- V2: If an extent buffer's write failed but it's also deleted from the tree before the transaction commits, don't abort the transaction with -EIO, since the unwritten node/leaf it represents can't be pointed to by any other node in a tree. V3: Correct V2, missed unstaged changes. fs/btrfs/disk-io.c | 4 +-- fs/btrfs/extent-tree.c | 8 ++ fs/btrfs/extent_io.c | 76 +- fs/btrfs/extent_io.h | 3 +- fs/btrfs/transaction.c | 22 +-- fs/btrfs/transaction.h | 5 ++-- fs/btrfs/tree-log.c| 13 + 7 files changed, 111 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 23393ec..8b54acf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -610,7 +610,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, goto err; eb-read_mirror = mirror; - if (test_bit(EXTENT_BUFFER_IOERR, eb-bflags)) { + if (test_bit(EXTENT_BUFFER_READ_ERR, eb-bflags)) { ret = -EIO; goto err; } @@ -683,7 +683,7 @@ static int btree_io_failed_hook(struct page *page, int failed_mirror) struct btrfs_root *root = BTRFS_I(page-mapping-host)-root; eb = (struct extent_buffer *)page-private; - set_bit(EXTENT_BUFFER_IOERR, eb-bflags); + set_bit(EXTENT_BUFFER_READ_ERR, eb-bflags); eb-read_mirror = failed_mirror; atomic_dec(eb-io_pages); if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags)) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ef0845d..608814b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6221,6 +6221,14 @@ out: * anymore. */ clear_bit(EXTENT_BUFFER_CORRUPT, buf-bflags); + /* +* The unwritten node/leaf (due to an IO error) isn't pointed to by any +* other node in a tree, so it's safe to forget about the write error +* and avoid a transaction abort. +*/ + if (test_and_clear_bit(EXTENT_BUFFER_WRITE_ERR, buf-bflags)) + atomic_dec(trans-transaction-eb_write_errors); + btrfs_put_block_group(cache); } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 91f866c..da1706f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -20,6 +20,7 @@ #include locking.h #include rcu-string.h #include backref.h +#include transaction.h
Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.
Original Message Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label. From: Eric Sandeen sand...@redhat.com To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年09月24日 02:51 On 9/23/14 7:49 AM, Chris Mason wrote: On 09/23/2014 01:40 AM, Qu Wenruo wrote: [BUG] Originally when mount btrfs with -o subvol= mount option, btrfs will lose all security lable. And if the btrfs fs is mounted somewhere else, due to the lost of security lable, SELinux will refuse to mount since the same super block is being mounted using different security lable. [REPRODUCER] With SELinux enabled: #mkfs -t btrfs /dev/sda5 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs #btrfs subvolume create /mnt/btrfs/subvol #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test kernel message: SELinux: mount invalid. Same superblock, different security settings for (dev sda5, type btrfs) [REASON] This happens because btrfs will call vfs_kern_mount() and then mount_subtree() to handle subvolume name lookup. First mount will cut off all the security lables and when it comes to the second vfs_kern_mount(), it has no security label now. [FIX] This patch will makes btrfs behavior much more like nfs, which has the type flag FS_BINARY_MOUNTDATA, making btrfs handles the security label internally. So security label will be set in the real mount time and won't lose label when use with subvol= mount option. Thanks for working on this. Eric Sandeen (cc'd) was trying out something similar recently, so I want to make sure this doesn't conflict with his ideas. My ideas didn't get very far. ;) What I was after was a way for multiple subvolumes to have unique contexts. It looks like this might do the trick, as long as they are mounted on a unique mount point. Would this allow subvolume create to take a context, so that everything under /mnt/btrfs/subvol/ has a unique subvol-wide context? thanks, -Eric Did you mean the following situation? /dev/sdb default subvol(FS_TREE) mounted on /mnt/default with context A /dev/sdb subvol=subvol mounted on /mnt/subvol with context B If that's your goal, I am afraid that my patch can't achieve it and even worse, will even forbid it. :( SELinux doesn't allow same superblock mounted with different context, and the patch follows it. If SELinux is modified to allow same superblock different context, then my patch also needs to be modified. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
Original Message Subject: Re: general thoughts and questions + general and RAID5/6 stability? From: Tobias Holst to...@tobby.eu To: linux-btrfs@vger.kernel.org Date: 2014年09月23日 22:24 If it is unknown, which of these options have been used at btrfs creation time - is it possible to check the state of these options afterwards on a mounted or unmounted filesystem? For mounted fs, sysfs can be used to see the feature enabled: /sys/fs/btrfs/UUID/features/ For unmounted fs, maybe not the best one, btrfs-show-super can show the incompat_flags in hex, and we can check kernel tree/fs/btrfs/ctree.h for BTRFS_FEATURE_INCOMPAT_## for the bits and calculate by hand... (Would it be better to add human readable output for btrfs-show-super?) Thanks, Qu 2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com: Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: mixed-bg- mixed data and metadata block groups extref - increased hard-link limit per file to 65536 raid56 - raid56 extended format skinny-metadata - reduced size metadata extent refs no-holes- no explicit hole extents for files -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help Me Learn Btrfs Code Base
Hello Guys, I am wondering if anybody here would like to mentor me in the btrfs code base in if they have any free time. I am not asking for much just someone to answer my questions and help me learn the code base well enough to help out. Thanks for Any Help, Nick -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Add human readable incompat flags output for btrfs-show-super
Add human readable incompat flags output for btrfs-show-super, now no longer needs to calculate the hex flags by hand. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- btrfs-show-super.c | 53 + 1 file changed, 53 insertions(+) diff --git a/btrfs-show-super.c b/btrfs-show-super.c index 38c5d26..c591865 100644 --- a/btrfs-show-super.c +++ b/btrfs-show-super.c @@ -285,6 +285,58 @@ static void print_backup_roots(struct btrfs_super_block *sb) } } +struct readable_flag_entry { + u64 bit; + char *output; +}; + +#define DEF_INCOMPAT_FLAG_ENTRY(bit_name) \ + {BTRFS_FEATURE_INCOMPAT_##bit_name, #bit_name} + +struct readable_flag_entry incompat_flags_array[] = { + DEF_INCOMPAT_FLAG_ENTRY(MIXED_BACKREF), + DEF_INCOMPAT_FLAG_ENTRY(DEFAULT_SUBVOL), + DEF_INCOMPAT_FLAG_ENTRY(MIXED_GROUPS), + DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZO), + DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZOv2), + DEF_INCOMPAT_FLAG_ENTRY(BIG_METADATA), + DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_IREF), + DEF_INCOMPAT_FLAG_ENTRY(RAID56), + DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA), + DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES) +}; +static const int incompat_flags_num = sizeof(incompat_flags_array) / + sizeof(struct readable_flag_entry); + +static void print_readable_incompat_flag(u64 flag) +{ + int i; + int first = 1; + struct readable_flag_entry *entry; + + if (!flag) + return; + printf(\t\t\t( ); + for (i = 0; i incompat_flags_num; i++) { + entry = incompat_flags_array + i; + if (flag entry-bit) { + if (first) + printf(%s , entry-output); + else + printf(|\n\t\t\t %s , entry-output); + } + first = 0; + } + flag = ~BTRFS_FEATURE_INCOMPAT_SUPP; + if (flag) { + if (first) + printf(unknown flag:%llx , flag); + else + printf(|\n\t\t\t unknown flag:%llx , flag); + } + printf()\n); +} + static void dump_superblock(struct btrfs_super_block *sb, int full) { int i; @@ -364,6 +416,7 @@ static void dump_superblock(struct btrfs_super_block *sb, int full) (unsigned long long)btrfs_super_compat_ro_flags(sb)); printf(incompat_flags\t\t0x%llx\n, (unsigned long long)btrfs_super_incompat_flags(sb)); + print_readable_incompat_flag(btrfs_super_incompat_flags(sb)); printf(csum_type\t\t%llu\n, (unsigned long long)btrfs_super_csum_type(sb)); printf(csum_size\t\t%llu\n, -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.
On 9/23/14 7:31 PM, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label. From: Eric Sandeen sand...@redhat.com To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年09月24日 02:51 On 9/23/14 7:49 AM, Chris Mason wrote: On 09/23/2014 01:40 AM, Qu Wenruo wrote: [BUG] Originally when mount btrfs with -o subvol= mount option, btrfs will lose all security lable. And if the btrfs fs is mounted somewhere else, due to the lost of security lable, SELinux will refuse to mount since the same super block is being mounted using different security lable. [REPRODUCER] With SELinux enabled: #mkfs -t btrfs /dev/sda5 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs #btrfs subvolume create /mnt/btrfs/subvol #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test kernel message: SELinux: mount invalid. Same superblock, different security settings for (dev sda5, type btrfs) [REASON] This happens because btrfs will call vfs_kern_mount() and then mount_subtree() to handle subvolume name lookup. First mount will cut off all the security lables and when it comes to the second vfs_kern_mount(), it has no security label now. [FIX] This patch will makes btrfs behavior much more like nfs, which has the type flag FS_BINARY_MOUNTDATA, making btrfs handles the security label internally. So security label will be set in the real mount time and won't lose label when use with subvol= mount option. Thanks for working on this. Eric Sandeen (cc'd) was trying out something similar recently, so I want to make sure this doesn't conflict with his ideas. My ideas didn't get very far. ;) What I was after was a way for multiple subvolumes to have unique contexts. It looks like this might do the trick, as long as they are mounted on a unique mount point. Would this allow subvolume create to take a context, so that everything under /mnt/btrfs/subvol/ has a unique subvol-wide context? thanks, -Eric Did you mean the following situation? /dev/sdb default subvol(FS_TREE) mounted on /mnt/default with context A /dev/sdb subvol=subvol mounted on /mnt/subvol with context B If that's your goal, I am afraid that my patch can't achieve it and even worse, will even forbid it. :( SELinux doesn't allow same superblock mounted with different context, and the patch follows it. If SELinux is modified to allow same superblock different context, then my patch also needs to be modified. oh, ok, I see. I don't think that my wish should disallow your patch. For the problem I was looking at, I think the only way forward would require some significant selinux modification, and treating a subvol root essentially like a superblock. So ... don't let me slow you down, at least for now. ;) -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.
Original Message Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label. From: Eric Sandeen sand...@redhat.com To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com, linux-btrfs@vger.kernel.org Date: 2014年09月24日 11:33 On 9/23/14 7:31 PM, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label. From: Eric Sandeen sand...@redhat.com To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年09月24日 02:51 On 9/23/14 7:49 AM, Chris Mason wrote: On 09/23/2014 01:40 AM, Qu Wenruo wrote: [BUG] Originally when mount btrfs with -o subvol= mount option, btrfs will lose all security lable. And if the btrfs fs is mounted somewhere else, due to the lost of security lable, SELinux will refuse to mount since the same super block is being mounted using different security lable. [REPRODUCER] With SELinux enabled: #mkfs -t btrfs /dev/sda5 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs #btrfs subvolume create /mnt/btrfs/subvol #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test kernel message: SELinux: mount invalid. Same superblock, different security settings for (dev sda5, type btrfs) [REASON] This happens because btrfs will call vfs_kern_mount() and then mount_subtree() to handle subvolume name lookup. First mount will cut off all the security lables and when it comes to the second vfs_kern_mount(), it has no security label now. [FIX] This patch will makes btrfs behavior much more like nfs, which has the type flag FS_BINARY_MOUNTDATA, making btrfs handles the security label internally. So security label will be set in the real mount time and won't lose label when use with subvol= mount option. Thanks for working on this. Eric Sandeen (cc'd) was trying out something similar recently, so I want to make sure this doesn't conflict with his ideas. My ideas didn't get very far. ;) What I was after was a way for multiple subvolumes to have unique contexts. It looks like this might do the trick, as long as they are mounted on a unique mount point. Would this allow subvolume create to take a context, so that everything under /mnt/btrfs/subvol/ has a unique subvol-wide context? thanks, -Eric Did you mean the following situation? /dev/sdb default subvol(FS_TREE) mounted on /mnt/default with context A /dev/sdb subvol=subvol mounted on /mnt/subvol with context B If that's your goal, I am afraid that my patch can't achieve it and even worse, will even forbid it. :( SELinux doesn't allow same superblock mounted with different context, and the patch follows it. If SELinux is modified to allow same superblock different context, then my patch also needs to be modified. oh, ok, I see. I don't think that my wish should disallow your patch. For the problem I was looking at, I think the only way forward would require some significant selinux modification, and treating a subvol root essentially like a superblock. So ... don't let me slow you down, at least for now. ;) -Eric BTW, since with the patch btrfs can in fact don't call security_sb_set_mnt_opts() if btrfs wants, what about set context to the dentry or something like that, but not set to superblock? I may be wrong, since I am still not famaliar with security parts... Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 v4] Btrfs: be aware of btree inode write errors to avoid fs corruption
While we have a transaction ongoing, the VM might decide at any time to call btree_inode-i_mapping-a_ops-writepages(), which will start writeback of dirty pages belonging to btree nodes/leafs. This call might return an error or the writeback might finish with an error before we attempt to commit the running transaction. If this happens, we might have no way of knowing that such error happened when we are committing the transaction - because the pages might no longer be marked dirty nor tagged for writeback (if a subsequent modification to the extent buffer didn't happen before the transaction commit) which makes filemap_fdata[write|wait]_range unable to find such pages (even if they're marked with SetPageError). So if this happens we must abort the transaction, otherwise we commit a super block with btree roots that point to btree nodes/leafs whose content on disk is invalid - either garbage or the content of some node/leaf from a past generation that got cowed or deleted and is no longer valid (for this later case we end up getting error messages like parent transid verify failed on 10826481664 wanted 25748 found 29562 when reading btree nodes/leafs from disk). Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's i_mapping would not be enough because we need to distinguish between log tree extents (not fatal) vs non-log tree extents (fatal) and because the next call to filemap_fdatawait_range() will catch and clear such errors in the mapping - and that call might be from a log sync and not from a transaction commit, which means we would not know about the error at transaction commit time. Also, checking for the eb flag EXTENT_BUFFER_IOERR at transaction commit time isn't done and would not be completely reliable, as the eb might be removed from memory and read back when trying to get it, which clears that flag right before reading the eb's pages from disk, making us not know about the previous write error. Using the new counters eb_write_errors and log_eb_write_errors in the transaction also makes us achieve the goal of AS_EIO/AS_ENOSPC when writepages() returns success, started writeback for all dirty pages and before filemap_fdatawait_range() is called, the writeback for all dirty pages had already finished with errors - because we were not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return success, as it could not know that writeback errors happened (the pages were no longer tagged for writeback). Signed-off-by: Filipe Manana fdman...@suse.com --- V2: If an extent buffer's write failed but it's also deleted from the tree before the transaction commits, don't abort the transaction with -EIO, since the unwritten node/leaf it represents can't be pointed to by any other node in a tree. V3: Correct V2, missed unstaged changes. V4: Use root's key to figure out which counter to update. fs/btrfs/disk-io.c | 4 +-- fs/btrfs/extent-tree.c | 12 + fs/btrfs/extent_io.c | 71 +- fs/btrfs/extent_io.h | 3 ++- fs/btrfs/transaction.c | 22 +--- fs/btrfs/transaction.h | 5 ++-- fs/btrfs/tree-log.c| 13 + 7 files changed, 110 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 23393ec..8b54acf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -610,7 +610,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, goto err; eb-read_mirror = mirror; - if (test_bit(EXTENT_BUFFER_IOERR, eb-bflags)) { + if (test_bit(EXTENT_BUFFER_READ_ERR, eb-bflags)) { ret = -EIO; goto err; } @@ -683,7 +683,7 @@ static int btree_io_failed_hook(struct page *page, int failed_mirror) struct btrfs_root *root = BTRFS_I(page-mapping-host)-root; eb = (struct extent_buffer *)page-private; - set_bit(EXTENT_BUFFER_IOERR, eb-bflags); + set_bit(EXTENT_BUFFER_READ_ERR, eb-bflags); eb-read_mirror = failed_mirror; atomic_dec(eb-io_pages); if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags)) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ef0845d..bdacd33 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6221,6 +6221,18 @@ out: * anymore. */ clear_bit(EXTENT_BUFFER_CORRUPT, buf-bflags); + /* +* The unwritten node/leaf (due to an IO error) isn't pointed to by any +* other node in a tree, so it's safe to forget about the write error +* and avoid a transaction abort. +*/ + if (test_and_clear_bit(EXTENT_BUFFER_WRITE_ERR, buf-bflags)) { + if (root-root_key.objectid == BTRFS_TREE_LOG_OBJECTID) + atomic_dec(trans-transaction-log_eb_write_errors); + else + atomic_dec(trans-transaction-eb_write_errors); + } + btrfs_put_block_group(cache);
[PATCH v2] btrfs-progs: Add human readable incompat flags output for btrfs-show-super
Add human readable incompat flags output for btrfs-show-super, now no longer needs to calculate the hex flags by hand. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- changelog: v2: Add the mising 0x before hex unknown incompat flags --- btrfs-show-super.c | 53 + 1 file changed, 53 insertions(+) diff --git a/btrfs-show-super.c b/btrfs-show-super.c index 38c5d26..456dbd8 100644 --- a/btrfs-show-super.c +++ b/btrfs-show-super.c @@ -285,6 +285,58 @@ static void print_backup_roots(struct btrfs_super_block *sb) } } +struct readable_flag_entry { + u64 bit; + char *output; +}; + +#define DEF_INCOMPAT_FLAG_ENTRY(bit_name) \ + {BTRFS_FEATURE_INCOMPAT_##bit_name, #bit_name} + +struct readable_flag_entry incompat_flags_array[] = { + DEF_INCOMPAT_FLAG_ENTRY(MIXED_BACKREF), + DEF_INCOMPAT_FLAG_ENTRY(DEFAULT_SUBVOL), + DEF_INCOMPAT_FLAG_ENTRY(MIXED_GROUPS), + DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZO), + DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZOv2), + DEF_INCOMPAT_FLAG_ENTRY(BIG_METADATA), + DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_IREF), + DEF_INCOMPAT_FLAG_ENTRY(RAID56), + DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA), + DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES) +}; +static const int incompat_flags_num = sizeof(incompat_flags_array) / + sizeof(struct readable_flag_entry); + +static void print_readable_incompat_flag(u64 flag) +{ + int i; + int first = 1; + struct readable_flag_entry *entry; + + if (!flag) + return; + printf(\t\t\t( ); + for (i = 0; i incompat_flags_num; i++) { + entry = incompat_flags_array + i; + if (flag entry-bit) { + if (first) + printf(%s , entry-output); + else + printf(|\n\t\t\t %s , entry-output); + } + first = 0; + } + flag = ~BTRFS_FEATURE_INCOMPAT_SUPP; + if (flag) { + if (first) + printf(unknown flag: 0x%llx , flag); + else + printf(|\n\t\t\t unknown flag: 0x%llx , flag); + } + printf()\n); +} + static void dump_superblock(struct btrfs_super_block *sb, int full) { int i; @@ -364,6 +416,7 @@ static void dump_superblock(struct btrfs_super_block *sb, int full) (unsigned long long)btrfs_super_compat_ro_flags(sb)); printf(incompat_flags\t\t0x%llx\n, (unsigned long long)btrfs_super_incompat_flags(sb)); + print_readable_incompat_flag(btrfs_super_incompat_flags(sb)); printf(csum_type\t\t%llu\n, (unsigned long long)btrfs_super_csum_type(sb)); printf(csum_size\t\t%llu\n, -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html