Re: Sparse Warnings about locks in extent-tree.c

2014-09-23 Thread Liu Bo
On Mon, Sep 22, 2014 at 09:57:35PM -0400, nick wrote:
 Hello Btfs Developers,
 I am new so am unsure of how to fix this but we are hitting some sparse 
 warnings about unlock/lock is having 
 a wrong count when exiting certain functions in extent-tree.c. I will paste 
 the warnings below for you guys.
 Hope this is of help.
 Cheers,
 Nick  
 fs/btrfs/extent-tree.c:6386:39: warning: context imbalance in 
 'btrfs_lock_cluster' - wrong count at exit
 fs/btrfs/extent-tree.c:6663:44: warning: context imbalance in 
 'find_free_extent' - unexpected unlock
 fs/btrfs/extent-tree.c:8810:9: warning: context imbalance in 
 'btrfs_put_block_group_cache' - wrong count at exit

Hi Nick,

What program is producing these warning?  It looks like a static code analysis
tool or something.

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Corrupted metadata. 'btrfs check' fails repair due to an assertion failure: '!(root-ref_cows trans-transid != root-last_trans)'

2014-09-23 Thread Jaap Pieroen
I forgot the attachment. Here it is.
# btrfs-debug-tree -b 13415158087680 /dev/sdh
leaf 13415158087680 items 90 free space 8577 generation 140065 owner 5007
fs uuid 20ccaf09-54ea-486e-9495-9dc91b933e9c
chunk uuid f5a0dea1-b250-4ea4-bf7f-50d30401a708
item 0 key (7516 DIR_INDEX 102) itemoff 16232 itemsize 51
location key (7617 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082290.dng_original
item 1 key (7516 DIR_INDEX 103) itemoff 16190 itemsize 42
location key (7618 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082293.dng
item 2 key (7516 DIR_INDEX 104) itemoff 16139 itemsize 51
location key (7619 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082293.dng_original
item 3 key (7516 DIR_INDEX 105) itemoff 16097 itemsize 42
location key (7620 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082295.dng
item 4 key (7516 DIR_INDEX 106) itemoff 16046 itemsize 51
location key (7621 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082295.dng_original
item 5 key (7516 DIR_INDEX 107) itemoff 16004 itemsize 42
location key (7622 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082299.dng
item 6 key (7516 DIR_INDEX 108) itemoff 15953 itemsize 51
location key (7623 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082299.dng_original
item 7 key (7516 DIR_INDEX 536871021) itemoff 15911 itemsize 42
location key (7624 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082321.dng
item 8 key (7516 DIR_INDEX 110) itemoff 15860 itemsize 51
location key (7625 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082321.dng_original
item 9 key (7516 DIR_INDEX 111) itemoff 15818 itemsize 42
location key (7626 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082328.dng
item 10 key (7516 DIR_INDEX 112) itemoff 15767 itemsize 51
location key (7627 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082328.dng_original
item 11 key (7516 DIR_INDEX 113) itemoff 15725 itemsize 42
location key (7628 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082349.dng
item 12 key (7516 DIR_INDEX 114) itemoff 15674 itemsize 51
location key (7629 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082349.dng_original
item 13 key (7516 DIR_INDEX 115) itemoff 15632 itemsize 42
location key (7630 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082350.dng
item 14 key (7516 DIR_INDEX 116) itemoff 15581 itemsize 51
location key (7631 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082350.dng_original
item 15 key (7516 DIR_INDEX 117) itemoff 15539 itemsize 42
location key (7632 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082351.dng
item 16 key (7516 DIR_INDEX 118) itemoff 15488 itemsize 51
location key (7633 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082351.dng_original
item 17 key (7516 DIR_INDEX 119) itemoff 15446 itemsize 42
location key (7634 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082356.dng
item 18 key (7516 DIR_INDEX 120) itemoff 15395 itemsize 51
location key (7635 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082356.dng_original
item 19 key (7516 DIR_INDEX 121) itemoff 15353 itemsize 42
location key (7636 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082357.dng
item 20 key (7516 DIR_INDEX 122) itemoff 15302 itemsize 51
location key (7637 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082357.dng_original
item 21 key (7516 DIR_INDEX 123) itemoff 15260 itemsize 42
location key (7638 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082358.dng
item 22 key (7516 DIR_INDEX 124) itemoff 15209 itemsize 51
location key (7639 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082358.dng_original
item 23 key (7516 DIR_INDEX 125) itemoff 15167 itemsize 42
location key (7640 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: P4082359.dng
item 24 key (7516 DIR_INDEX 126) itemoff 15116 itemsize 51
location key (7641 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: P4082359.dng_original
item 25 key (7516 DIR_INDEX 127) itemoff 15074 itemsize 42

Re: [PATCH v4] btrfs-progs: fix page align issue for lzo compress in restore

2014-09-23 Thread Gui Hecheng
On Tue, 2014-09-23 at 10:25 +0800, Gui Hecheng wrote:
 When runing restore under lzo compression, bad compress length
 problems are encountered.
 It is because there is a page align problem with the @decompress_lzo,
 as follows:
   |--| ||-| |--|...|--|
 page ^page   page
  |
 3 bytes left
 
   When lzo compress pages im RAM, lzo will ensure that
   the 4 bytes len will be in one page as a whole.
   There is a situation that 3 (or less) bytes are left
   at the end of a page, and then the 4 bytes len is
   stored at the start of the next page.
   But the @decompress_lzo doesn't goto the start of
   the next page and continue to read the next 4 bytes
   which is across two pages, so a random value is fetched
   as a bad compress length.
 
 So we check page alignment every time before we are going to
 fetch the next @len and after the former piece of data is decompressed.
 If the current page that we reach has less than 4 bytes left,
 then we should fetch the next @len at the start of next page.
 
 Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
 Reviewed-by: Marc Dietrich marvi...@gmx.de
 ---
 changelog
   v1-v2: adopt alignment check method suggested by Marc
   v2-v3: make code more readable
   v3-v4: keep type safety
 ---
  cmds-restore.c | 29 +++--
  1 file changed, 27 insertions(+), 2 deletions(-)
 
 diff --git a/cmds-restore.c b/cmds-restore.c
 index 38a131e..fa5d5d1 100644
 --- a/cmds-restore.c
 +++ b/cmds-restore.c
 @@ -56,7 +56,10 @@ static int get_xattrs = 0;
  static int dry_run = 0;
  
  #define LZO_LEN 4
 -#define PAGE_CACHE_SIZE 4096
 +#define PAGE_CACHE_SIZE 4096UL
 +#define PAGE_CACHE_MASK (~(PAGE_CACHE_SIZE - 1))
 +#define PAGE_CACHE_ALIGN(addr) (((addr) + PAGE_CACHE_SIZE - 1)   \
 +  PAGE_CACHE_MASK)
  #define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3)
  
  static int decompress_zlib(char *inbuf, char *outbuf, u64 compress_len,
 @@ -93,6 +96,28 @@ static inline size_t read_compress_length(unsigned char 
 *buf)
   return le32_to_cpu(dlen);
  }
  
 +static void align_if_need(size_t *tot_in, size_t *in_len)
 +{
 + size_t tot_in_aligned;
 + size_t bytes_left;
 +
 + tot_in_aligned = PAGE_CACHE_ALIGN(*tot_in);
 + bytes_left = tot_in_aligned - *tot_in;
 +
 + if (bytes_left = LZO_LEN)
 + return;
 +
 + /*
 +  * The LZO_LEN bytes is guaranteed to be
 +  * in one page as a whole, so if a page
 +  * has fewer than LZO_LEN bytes left,
 +  * the LZO_LEN bytes should be fetched
 +  * at the start of the next page
 +  */
 + *in_len += bytes_left;
 + *tot_in = tot_in_aligned;
 +}
 +
  static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 
 compress_len,
 u64 *decompress_len)
  {
 @@ -135,8 +160,8 @@ static int decompress_lzo(unsigned char *inbuf, char 
 *outbuf, u64 compress_len,
   }
   out_len += new_len;
   outbuf += new_len;
 + align_if_need(tot_in, in_len);
   inbuf += in_len;
 - tot_in += in_len;
   }
  
   *decompress_len = out_len;

Sorry, please scratch this one, the comments should be reformated. 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] btrfs-progs: fix page align issue for lzo compress in restore

2014-09-23 Thread Gui Hecheng
When runing restore under lzo compression, bad compress length
problems are encountered.
It is because there is a page align problem with the @decompress_lzo,
as follows:
|--| ||-| |--|...|--|
  page ^page   page
   |
  3 bytes left

When lzo compress pages im RAM, lzo will ensure that
the 4 bytes len will be in one page as a whole.
There is a situation that 3 (or less) bytes are left
at the end of a page, and then the 4 bytes len is
stored at the start of the next page.
But the @decompress_lzo doesn't goto the start of
the next page and continue to read the next 4 bytes
which is across two pages, so a random value is fetched
as a bad compress length.

So we check page alignment every time before we are going to
fetch the next @len and after the former piece of data is decompressed.
If the current page that we reach has less than 4 bytes left,
then we should fetch the next @len at the start of next page.

Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
Reviewed-by: Marc Dietrich marvi...@gmx.de
---
changelog
v1-v2: adopt alignment check method suggested by Marc
v2-v3: make code more readable
v3-v4: keep type safety  reformat comments
---
 cmds-restore.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/cmds-restore.c b/cmds-restore.c
index e09acc4..1fe2df0 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -56,7 +56,10 @@ static int get_xattrs = 0;
 static int dry_run = 0;
 
 #define LZO_LEN 4
-#define PAGE_CACHE_SIZE 4096
+#define PAGE_CACHE_SIZE 4096UL
+#define PAGE_CACHE_MASK (~(PAGE_CACHE_SIZE - 1))
+#define PAGE_CACHE_ALIGN(addr) (((addr) + PAGE_CACHE_SIZE - 1) \
+PAGE_CACHE_MASK)
 #define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3)
 
 static int decompress_zlib(char *inbuf, char *outbuf, u64 compress_len,
@@ -93,6 +96,26 @@ static inline size_t read_compress_length(unsigned char *buf)
return le32_to_cpu(dlen);
 }
 
+static void align_if_need(size_t *tot_in, size_t *in_len)
+{
+   size_t tot_in_aligned;
+   size_t bytes_left;
+
+   tot_in_aligned = PAGE_CACHE_ALIGN(*tot_in);
+   bytes_left = tot_in_aligned - *tot_in;
+
+   if (bytes_left = LZO_LEN)
+   return;
+
+   /*
+* The LZO_LEN bytes is guaranteed to be in one page as a whole,
+* so if a page has fewer than LZO_LEN bytes left, the LZO_LEN bytes
+* should be fetched at the start of the next page
+*/
+   *in_len += bytes_left;
+   *tot_in = tot_in_aligned;
+}
+
 static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 compress_len,
  u64 *decompress_len)
 {
@@ -135,8 +158,8 @@ static int decompress_lzo(unsigned char *inbuf, char 
*outbuf, u64 compress_len,
}
out_len += new_len;
outbuf += new_len;
+   align_if_need(tot_in, in_len);
inbuf += in_len;
-   tot_in += in_len;
}
 
*decompress_len = out_len;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 11/12] security, crypto: LLVMLinux: Remove VLAIS from ima_crypto.c

2014-09-23 Thread Dmitry Kasatkin
On 23/09/14 07:42, beh...@converseincode.com wrote:
 From: Behan Webster beh...@converseincode.com

 Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99
 compliant equivalent. This patch allocates the appropriate amount of memory
 using a char array using the SHASH_DESC_ON_STACK macro.

 The new code can be compiled with both gcc and clang.

 Signed-off-by: Behan Webster beh...@converseincode.com
 Reviewed-by: Mark Charlebois charl...@gmail.com
 Reviewed-by: Jan-Simon Möller dl...@gmx.de
 Acked-by: Herbert Xu herb...@gondor.apana.org.au
 Cc: t...@linutronix.de

Looks good. Thanks.

Acked-by: Dmitry Kasatkin d.kasat...@samsung.com


 ---
  security/integrity/ima/ima_crypto.c | 47 
 +++--
  1 file changed, 19 insertions(+), 28 deletions(-)

 diff --git a/security/integrity/ima/ima_crypto.c 
 b/security/integrity/ima/ima_crypto.c
 index 0bd7328..e35f5d9 100644
 --- a/security/integrity/ima/ima_crypto.c
 +++ b/security/integrity/ima/ima_crypto.c
 @@ -380,17 +380,14 @@ static int ima_calc_file_hash_tfm(struct file *file,
   loff_t i_size, offset = 0;
   char *rbuf;
   int rc, read = 0;
 - struct {
 - struct shash_desc shash;
 - char ctx[crypto_shash_descsize(tfm)];
 - } desc;
 + SHASH_DESC_ON_STACK(shash, tfm);
  
 - desc.shash.tfm = tfm;
 - desc.shash.flags = 0;
 + shash-tfm = tfm;
 + shash-flags = 0;
  
   hash-length = crypto_shash_digestsize(tfm);
  
 - rc = crypto_shash_init(desc.shash);
 + rc = crypto_shash_init(shash);
   if (rc != 0)
   return rc;
  
 @@ -420,7 +417,7 @@ static int ima_calc_file_hash_tfm(struct file *file,
   break;
   offset += rbuf_len;
  
 - rc = crypto_shash_update(desc.shash, rbuf, rbuf_len);
 + rc = crypto_shash_update(shash, rbuf, rbuf_len);
   if (rc)
   break;
   }
 @@ -429,7 +426,7 @@ static int ima_calc_file_hash_tfm(struct file *file,
   kfree(rbuf);
  out:
   if (!rc)
 - rc = crypto_shash_final(desc.shash, hash-digest);
 + rc = crypto_shash_final(shash, hash-digest);
   return rc;
  }
  
 @@ -487,18 +484,15 @@ static int ima_calc_field_array_hash_tfm(struct 
 ima_field_data *field_data,
struct ima_digest_data *hash,
struct crypto_shash *tfm)
  {
 - struct {
 - struct shash_desc shash;
 - char ctx[crypto_shash_descsize(tfm)];
 - } desc;
 + SHASH_DESC_ON_STACK(shash, tfm);
   int rc, i;
  
 - desc.shash.tfm = tfm;
 - desc.shash.flags = 0;
 + shash-tfm = tfm;
 + shash-flags = 0;
  
   hash-length = crypto_shash_digestsize(tfm);
  
 - rc = crypto_shash_init(desc.shash);
 + rc = crypto_shash_init(shash);
   if (rc != 0)
   return rc;
  
 @@ -508,7 +502,7 @@ static int ima_calc_field_array_hash_tfm(struct 
 ima_field_data *field_data,
   u32 datalen = field_data[i].len;
  
   if (strcmp(td-name, IMA_TEMPLATE_IMA_NAME) != 0) {
 - rc = crypto_shash_update(desc.shash,
 + rc = crypto_shash_update(shash,
   (const u8 *) field_data[i].len,
   sizeof(field_data[i].len));
   if (rc)
 @@ -518,13 +512,13 @@ static int ima_calc_field_array_hash_tfm(struct 
 ima_field_data *field_data,
   data_to_hash = buffer;
   datalen = IMA_EVENT_NAME_LEN_MAX + 1;
   }
 - rc = crypto_shash_update(desc.shash, data_to_hash, datalen);
 + rc = crypto_shash_update(shash, data_to_hash, datalen);
   if (rc)
   break;
   }
  
   if (!rc)
 - rc = crypto_shash_final(desc.shash, hash-digest);
 + rc = crypto_shash_final(shash, hash-digest);
  
   return rc;
  }
 @@ -565,15 +559,12 @@ static int __init ima_calc_boot_aggregate_tfm(char 
 *digest,
  {
   u8 pcr_i[TPM_DIGEST_SIZE];
   int rc, i;
 - struct {
 - struct shash_desc shash;
 - char ctx[crypto_shash_descsize(tfm)];
 - } desc;
 + SHASH_DESC_ON_STACK(shash, tfm);
  
 - desc.shash.tfm = tfm;
 - desc.shash.flags = 0;
 + shash-tfm = tfm;
 + shash-flags = 0;
  
 - rc = crypto_shash_init(desc.shash);
 + rc = crypto_shash_init(shash);
   if (rc != 0)
   return rc;
  
 @@ -581,10 +572,10 @@ static int __init ima_calc_boot_aggregate_tfm(char 
 *digest,
   for (i = TPM_PCR0; i  TPM_PCR8; i++) {
   ima_pcrread(i, pcr_i);
   /* now accumulate with current aggregate */
 - rc = crypto_shash_update(desc.shash, pcr_i, TPM_DIGEST_SIZE);
 + rc = crypto_shash_update(shash, pcr_i, TPM_DIGEST_SIZE);
   }
 

Re: [PATCH 1/2] Btrfs: add missing end_page_writeback on submit_extent_page failure

2014-09-23 Thread Liu Bo
On Mon, Sep 22, 2014 at 05:41:04PM +0100, Filipe Manana wrote:
 If submit_extent_page() fails in write_one_eb(), we end up with the current
 page not marked dirty anymore, unlocked and marked for writeback. But we never
 end up calling end_page_writeback() against the page, which will make calls to
 filemap_fdatawait_range (e.g. at transaction commit time) hang forever waiting
 for the writeback bit to be cleared from the page.
 
 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/extent_io.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index 3af4966..91f866c 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -3668,6 +3668,7 @@ static noinline_for_stack int write_one_eb(struct 
 extent_buffer *eb,
   if (ret) {
   set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
   SetPageError(p);
 + end_page_writeback(p);

This is not always true, and it depends on the place where we get error in
submit_extent_page(), whether it has built and submitted a bio, if it's true,
case can be different as bio_endio will be called, and calling 
end_page_writeback()
again will end up with panic.

thanks,
-liubo
 
   if (atomic_sub_and_test(num_pages - i, eb-io_pages))
   end_extent_buffer_writeback(eb);
   ret = -EIO;
 -- 
 1.9.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Austin S Hemmelgarn

On 2014-09-22 16:51, Stefan G. Weichinger wrote:

Am 20.09.2014 um 11:32 schrieb Duncan:


What I do as part of my regular backup regime, is every few kernel cycles
I wipe the (first level) backup and do a fresh mkfs.btrfs, activating new
optional features as I believe appropriate.  Then I boot to the new
backup and run a bit to test it, then wipe the normal working copy and do
a fresh mkfs.btrfs on it, again with the new optional features enabled
that I want.


Is re-creating btrfs-filesystems *recommended* in any way?

Does that actually make a difference in the fs-structure?

I would recommend it, there are some newer features that you can only 
set at mkfs time.  Quite often, when a new feature is implemented, it is 
some time before things are such that it can be enabled online, and even 
then that doesn't convert anything until it is rewritten.

So far I assumed it was enough to keep the kernel up2date, use current
(stable) btrfs-progs and run some scrub every week or so (not to mention
backups .. if it ain't backed up, it was/isn't important).

Stefan







smime.p7s
Description: S/MIME Cryptographic Signature


mount problem

2014-09-23 Thread Simone Ferretti
Hi all,

we're testing BTRFS on our Debian server.  After a lot of operations
simulating a RAID1 failure, every time I mount my BTRFS RAID1 volume
the kernel logs these messages:

[73894.436173] BTRFS: bdev /dev/etherd/e30.20 errs: wr 33036, rd 0, flush 0, 
corrupt 2806, gen 0
[73894.436181] BTRFS: bdev /dev/etherd/e60.28 errs: wr 244165, rd 0, flush 0, 
corrupt 1, gen 4

Everything seems to work nice but I'm courious to know what these
messages mean (in particular what do gen and corrupt mean?).

# uname -a
Linux dub 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64 GNU/Linux

# btrfs --version
Btrfs v3.16

# btrfs fi show
Label: 'btrfs_multiappliance'  uuid: 3452ffdd-c09b-43dd-9adb-cffde8518a72
Total devices 2 FS bytes used 20.03GiB
devid1 size 1.82TiB used 24.03GiB path /dev/etherd/e30.20
devid2 size 3.64TiB used 24.03GiB path /dev/etherd/e60.28

# btrfs fi df /media/multiapp
Data, RAID1: total=22.00GiB, used=20.01GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=21.23MiB
unknown, single: total=16.00MiB, used=0.00

# dmesg
[82932.655078] BTRFS info (device etherd/e30.20): disk space caching is enabled
[82932.678380] BTRFS: bdev /dev/etherd/e30.20 errs: wr 33036, rd 0, flush 0, 
corrupt 2806, gen 0
[82932.678388] BTRFS: bdev /dev/etherd/e60.28 errs: wr 244165, rd 0, flush 0, 
corrupt 1, gen 4


-- 
Thanks in advance,
Simone Ferretti
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.

2014-09-23 Thread Chris Mason
On 09/23/2014 01:40 AM, Qu Wenruo wrote:
 [BUG]
 Originally when mount btrfs with -o subvol= mount option, btrfs will
 lose all security lable.
 And if the btrfs fs is mounted somewhere else, due to the lost of
 security lable, SELinux will refuse to mount since the same super block
 is being mounted using different security lable.
 
 [REPRODUCER]
 With SELinux enabled:
  #mkfs -t btrfs /dev/sda5
  #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
  #btrfs subvolume create /mnt/btrfs/subvol
  #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5
   /mnt/test
 
 kernel message:
 SELinux: mount invalid.  Same superblock, different security settings
 for (dev sda5, type btrfs)
 
 [REASON]
 This happens because btrfs will call vfs_kern_mount() and then
 mount_subtree() to handle subvolume name lookup.
 First mount will cut off all the security lables and when it comes to
 the second vfs_kern_mount(), it has no security label now.
 
 [FIX]
 This patch will makes btrfs behavior much more like nfs,
 which has the type flag FS_BINARY_MOUNTDATA,
 making btrfs handles the security label internally.
 So security label will be set in the real mount time and won't lose
 label when use with subvol= mount option.

Thanks for working on this.  Eric Sandeen (cc'd) was trying out
something similar recently, so I want to make sure this doesn't conflict
with his ideas.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs: add test regarding clearing compression flag/property

2014-09-23 Thread David Disseldorp
On Thu, 11 Sep 2014 11:45:34 +0100, Filipe Manana wrote:

 Regression test for btrfs where removing the flag FS_COMPR_FL
 (chattr -c) from an inode wouldn't clear its compression property.
 This was fixed in the following linux kernel patch:
 
   Btrfs: add missing compression property remove in btrfs_ioctl_setflags
 
 Signed-off-by: Filipe Manana fdman...@suse.com

I would have expected this to fail with MOUNT_OPTIONS=-o compress=no,
but it looks as though Btrfs stores the attributes unconditionally.
Either way this change looks good to me.
Reviewed-by: David Disseldorp dd...@suse.de

Cheers, David
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs send does not work from readonly device

2014-09-23 Thread GEO
Let's say we have a snapshot called snapshot on our device /dev/sdb.
Now we boot a qemu machine and attatch the disk read only (since we do not
want to put our data at risk in the vm, and read only should be enough to
get our snapshot data).

I used

qemu-system-x86_64 -drive -file=/dev/sdb,if=scsi,readonly -hda
path_to_guest_operating_system

When we are in the qemu instance (suppose our device  /dev/sdb is mounted
on /mnt/import), we want to import our data now using btrfs send/receive:

btrfs send /mnt/import/snapshot | btrfs receive  /

we get the following problems in syslog due to the fact that our device is
read only:

delayed_refs has NO entry
BTRFS warning (device sdb): btrfs_update_root:151: Aborting unsused
transaction(No space left)
BTRFS warning (device sdb): Skipping commit of aborted transaction.
BTRFS warning (device sdb): cleanup_transaction:1547: Aborting unused
transaction (No space left)

Apart from the snapshot folder being created in / nothing happens, the
process simply hangs forever, with no actual data getting imported.

Is that supposed to be that way? Why is readonly not enough to import data
using btrfs send?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: add missing end_page_writeback on submit_extent_page failure

2014-09-23 Thread Filipe David Manana
On Tue, Sep 23, 2014 at 11:14 AM, Liu Bo bo.li@oracle.com wrote:
 On Mon, Sep 22, 2014 at 05:41:04PM +0100, Filipe Manana wrote:
 If submit_extent_page() fails in write_one_eb(), we end up with the current
 page not marked dirty anymore, unlocked and marked for writeback. But we 
 never
 end up calling end_page_writeback() against the page, which will make calls 
 to
 filemap_fdatawait_range (e.g. at transaction commit time) hang forever 
 waiting
 for the writeback bit to be cleared from the page.

 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/extent_io.c | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index 3af4966..91f866c 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -3668,6 +3668,7 @@ static noinline_for_stack int write_one_eb(struct 
 extent_buffer *eb,
   if (ret) {
   set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
   SetPageError(p);
 + end_page_writeback(p);

 This is not always true, and it depends on the place where we get error in
 submit_extent_page(), whether it has built and submitted a bio, if it's true,
 case can be different as bio_endio will be called, and calling 
 end_page_writeback()
 again will end up with panic.

No, it's always true when the caller is write_one_eb(). If
submit_extent_page() returns an error, we're sure that the page wasn't
added to a bio, because bio_ret is always not NULL (but *bio_ret can
be). Also, if we call submit_one_bio() in submit_extent_page() (first
if statement) , we know that the bio doesn't contain the page.

thanks


 thanks,
 -liubo

   if (atomic_sub_and_test(num_pages - i, eb-io_pages))
   end_extent_buffer_writeback(eb);
   ret = -EIO;
 --
 1.9.1

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Stefan G. Weichinger
Am 23.09.2014 um 14:08 schrieb Austin S Hemmelgarn:
 On 2014-09-22 16:51, Stefan G. Weichinger wrote:
 Is re-creating btrfs-filesystems *recommended* in any way?

 Does that actually make a difference in the fs-structure?

 I would recommend it, there are some newer features that you can only
 set at mkfs time.  Quite often, when a new feature is implemented, it is
 some time before things are such that it can be enabled online, and even
 then that doesn't convert anything until it is rewritten.

What features for example?

I created my main btrfs a few months ago and would like to avoid
recreating it as this would mean restoring my root-fs on my main
workstation.

Although I would do it if it is worth it ;-)

I assume I could read some kind of version number out of the superblock
or so?

btrfs-show-super ?

S



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Austin S Hemmelgarn

On 2014-09-23 09:06, Stefan G. Weichinger wrote:

Am 23.09.2014 um 14:08 schrieb Austin S Hemmelgarn:

On 2014-09-22 16:51, Stefan G. Weichinger wrote:

Is re-creating btrfs-filesystems *recommended* in any way?

Does that actually make a difference in the fs-structure?


I would recommend it, there are some newer features that you can only
set at mkfs time.  Quite often, when a new feature is implemented, it is
some time before things are such that it can be enabled online, and even
then that doesn't convert anything until it is rewritten.


What features for example?
Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the 
following list of features:

mixed-bg- mixed data and metadata block groups
extref  - increased hard-link limit per file to 65536
raid56  - raid56 extended format
skinny-metadata - reduced size metadata extent refs
no-holes- no explicit hole extents for files

mixed-bg is something that you generally wouldn't want to change after mkfs.
extref can be enabled online, and the filesystem metadata gets updated 
as-needed, and dosen't provide any real performance improvement (but is 
needed for some mail servers that have HUGE mail-queues)
I don't know anything about the raid56 option, but there isn't any way 
to change it after mkfs.
skinyy-metadata can be changed online, and the format gets updated on 
rewrite of each metadata block.  This one does provide a performance 
improvement (stat() in particular runs noticeably faster).  You should 
probably enable this if it isn't already enabled, even if you don't 
recreate your filesystem.
no-holes cannot currently be changed online, and is a very recent 
addition (post v3.14 btrfs-progs I believe) that provides improved 
performance for sparse files (which is particularly useful if you are 
doing things with fixed size virtual machine disk images).


It's this last one that prompted me personally to recreate my 
filesystems most recently, as I use sparse files to save space as much 
as possible.


I created my main btrfs a few months ago and would like to avoid
recreating it as this would mean restoring my root-fs on my main
workstation.

Although I would do it if it is worth it ;-)

I assume I could read some kind of version number out of the superblock
or so?

btrfs-show-super ?

AFAIK there isn't really any 'version number' that has any meaning in 
the superblock (except for telling the kernel that it uses the stable 
disk layout), however, there are flag bits that you can look for 
(compat_flags, compat_ro_flags, and incompat_flags).  I'm not 100% 
certain what each bit means, but on my system with a only 1 month old 
BTRFS filesystem, with extref, skinny-metadata, and no-holes turned on, 
i have compat_flags: 0x0, compat_ro_flags: 0x0, and incompat_flags: 0x16b.


The other potentially significant thing is that the default 
nodesize/leafsize has changed recently from 4096 to 16384, as that gives 
somewhat better performance for most use cases.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 1/2] Btrfs: add missing end_page_writeback on submit_extent_page failure

2014-09-23 Thread Liu Bo
On Tue, Sep 23, 2014 at 02:03:07PM +0100, Filipe David Manana wrote:
 On Tue, Sep 23, 2014 at 11:14 AM, Liu Bo bo.li@oracle.com wrote:
  On Mon, Sep 22, 2014 at 05:41:04PM +0100, Filipe Manana wrote:
  If submit_extent_page() fails in write_one_eb(), we end up with the current
  page not marked dirty anymore, unlocked and marked for writeback. But we 
  never
  end up calling end_page_writeback() against the page, which will make 
  calls to
  filemap_fdatawait_range (e.g. at transaction commit time) hang forever 
  waiting
  for the writeback bit to be cleared from the page.
 
  Signed-off-by: Filipe Manana fdman...@suse.com
  ---
   fs/btrfs/extent_io.c | 1 +
   1 file changed, 1 insertion(+)
 
  diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
  index 3af4966..91f866c 100644
  --- a/fs/btrfs/extent_io.c
  +++ b/fs/btrfs/extent_io.c
  @@ -3668,6 +3668,7 @@ static noinline_for_stack int write_one_eb(struct 
  extent_buffer *eb,
if (ret) {
set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
SetPageError(p);
  + end_page_writeback(p);
 
  This is not always true, and it depends on the place where we get error in
  submit_extent_page(), whether it has built and submitted a bio, if it's 
  true,
  case can be different as bio_endio will be called, and calling 
  end_page_writeback()
  again will end up with panic.
 
 No, it's always true when the caller is write_one_eb(). If
 submit_extent_page() returns an error, we're sure that the page wasn't
 added to a bio, because bio_ret is always not NULL (but *bio_ret can
 be). Also, if we call submit_one_bio() in submit_extent_page() (first
 if statement) , we know that the bio doesn't contain the page.

I see, you're right.

Reviewed-by: Liu Bo bo.li@oracle.com

thanks,
-liubo

 
 thanks
 
 
  thanks,
  -liubo
 
if (atomic_sub_and_test(num_pages - i, 
  eb-io_pages))
end_extent_buffer_writeback(eb);
ret = -EIO;
  --
  1.9.1
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
 -- 
 Filipe David Manana,
 
 Reasonable men adapt themselves to the world.
  Unreasonable men adapt the world to themselves.
  That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Stefan G. Weichinger
Am 23.09.2014 um 15:38 schrieb Austin S Hemmelgarn:
 On 2014-09-23 09:06, Stefan G. Weichinger wrote:
 What features for example?
 Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the
 following list of features:
 mixed-bg- mixed data and metadata block groups
 extref- increased hard-link limit per file to 65536
 raid56- raid56 extended format
 skinny-metadata- reduced size metadata extent refs
 no-holes- no explicit hole extents for files
 
 mixed-bg is something that you generally wouldn't want to change after
 mkfs.
 extref can be enabled online, and the filesystem metadata gets updated
 as-needed, and dosen't provide any real performance improvement (but is
 needed for some mail servers that have HUGE mail-queues)

ok, not needed here

 I don't know anything about the raid56 option, but there isn't any way
 to change it after mkfs.

not needed in my systems.

 skinyy-metadata can be changed online, and the format gets updated on
 rewrite of each metadata block.  This one does provide a performance
 improvement (stat() in particular runs noticeably faster).  You should
 probably enable this if it isn't already enabled, even if you don't
 recreate your filesystem.

So this is done via btrfstune, right?

I will give that a try, for my rootfs it doesn't allow me right now as
it is obviously mounted (live-cd, right?).

 no-holes cannot currently be changed online, and is a very recent
 addition (post v3.14 btrfs-progs I believe) that provides improved
 performance for sparse files (which is particularly useful if you are
 doing things with fixed size virtual machine disk images).

Yes, I have some of those!

 AFAIK there isn't really any 'version number' that has any meaning in
 the superblock (except for telling the kernel that it uses the stable
 disk layout), however, there are flag bits that you can look for
 (compat_flags, compat_ro_flags, and incompat_flags).  I'm not 100%
 certain what each bit means, but on my system with a only 1 month old
 BTRFS filesystem, with extref, skinny-metadata, and no-holes turned on,
 i have compat_flags: 0x0, compat_ro_flags: 0x0, and incompat_flags: 0x16b.
 
 The other potentially significant thing is that the default
 nodesize/leafsize has changed recently from 4096 to 16384, as that gives
 somewhat better performance for most use cases.

I have the 16k for both already.

Thanks for your explanations, I will dig into it as soon as I find the
time. Seems I have to backup/restore quite some stuff ;-)

Stefan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sparse Warnings about locks in extent-tree.c

2014-09-23 Thread David Sterba
On Tue, Sep 23, 2014 at 02:45:44PM +0800, Liu Bo wrote:
 On Mon, Sep 22, 2014 at 09:57:35PM -0400, nick wrote:
  Hello Btfs Developers,
  I am new so am unsure of how to fix this but we are hitting some sparse 
  warnings about unlock/lock is having 
  a wrong count when exiting certain functions in extent-tree.c. I will paste 
  the warnings below for you guys.
  Hope this is of help.
  Cheers,
  Nick  
  fs/btrfs/extent-tree.c:6386:39: warning: context imbalance in 
  'btrfs_lock_cluster' - wrong count at exit
  fs/btrfs/extent-tree.c:6663:44: warning: context imbalance in 
  'find_free_extent' - unexpected unlock
  fs/btrfs/extent-tree.c:8810:9: warning: context imbalance in 
  'btrfs_put_block_group_cache' - wrong count at exit
 
 What program is producing these warning?  It looks like a static code analysis
 tool or something.

https://sparse.wiki.kernel.org/index.php/Main_Page

It understands a few semantics like the locking and reports when there's
an unannotated function that has imbalanced locking, like the errors
above.

Any static analyzer has some ratio of false positives so not every
report means there's a bug. One would expect that a function
'btrfs_lock_cluster' would act as a lock and leaves some locks held at
exit. The fix is to annotate it a after review.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: be aware of btree inode write errors to avoid fs corruption

2014-09-23 Thread Liu Bo
On Mon, Sep 22, 2014 at 05:41:05PM +0100, Filipe Manana wrote:
 While we have a transaction ongoing, the VM might decide at any time
 to call btree_inode-i_mapping-a_ops-writepages(), which will start
 writeback of dirty pages belonging to btree nodes/leafs. This call
 might return an error or the writeback might finish with an error
 before we attempt to commit the running transaction. If this happens,
 we might have no way of knowing that such error happened when we are
 committing the transaction - because the pages might no longer be
 marked dirty nor tagged for writeback (if a subsequent modification
 to the extent buffer didn't happen before the transaction commit) which
 makes filemap_fdata[write|wait]_range unable to find such pages (even
 if they're marked with SetPageError).
 So if this happens we must abort the transaction, otherwise we commit
 a super block with btree roots that point to btree nodes/leafs whose
 content on disk is invalid - either garbage or the content of some
 node/leaf from a past generation that got cowed or deleted and is no
 longer valid (for this later case we end up getting error messages like
 parent transid verify failed on 10826481664 wanted 25748 found 29562
 when reading btree nodes/leafs from disk).

Good catch!

 
 Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
 i_mapping would not be enough because we need to distinguish between
 log tree extents (not fatal) vs non-log tree extents (fatal) and
 because the next call to filemap_fdatawait_range() will catch and clear
 such errors in the mapping - and that call might be from a log sync and
 not from a transaction commit, which means we would not know about the
 error at transaction commit time. Also, checking for the eb flag
 EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
 not be completely reliable, as the eb might be removed from memory and
 read back when trying to get it, which clears that flag right before
 reading the eb's pages from disk, making us not know about the previous
 write error.
 
 Using the BTRFS_INODE_BTREE_IO_ERR and BTRFS_INODE_BTREE_LOG_IO_ERR
 inode flags also makes us achieve the goal of AS_EIO/AS_ENOSPC when
 writepages() returns success, started writeback for all dirty pages
 and before filemap_fdatawait_range() is called, the writeback for
 all dirty pages had already finished with errors - because we were
 not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return
 success, as it could not know that writeback errors happened (the
 pages were no longer tagged for writeback).

But there is a problem when the extent buffer with IO error is COWed and
deleted, but the BTRFS_INODE_BTREE_IO_ERR flag still remains in
btree_inode's runtime_flags, we'd still run into cleanup_transaction() as this
patch expects.  So I think we need to remove that BTREE_IO_ERR flag in this
case.

Well, I notice that you don't clear rest pages' DIRTY in the error case, so it
can lead to hitting the BUG_ON in btrfs_release_extent_buffer_page() when the eb
with IO error is COWed by a new good eb and gets itself freed then.  I'm making
a patch to add missing clear_page_dirty_for_io().

thanks,
-liubo

 
 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/btrfs_inode.h |  2 ++
  fs/btrfs/extent_io.c   | 69 
 +++---
  fs/btrfs/transaction.c | 20 ---
  fs/btrfs/transaction.h |  3 +--
  fs/btrfs/tree-log.c| 13 ++
  5 files changed, 93 insertions(+), 14 deletions(-)
 
 diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
 index 3511031..dbe37dc 100644
 --- a/fs/btrfs/btrfs_inode.h
 +++ b/fs/btrfs/btrfs_inode.h
 @@ -44,6 +44,8 @@
  #define BTRFS_INODE_IN_DELALLOC_LIST 9
  #define BTRFS_INODE_READDIO_NEED_LOCK10
  #define BTRFS_INODE_HAS_PROPS11
 +#define BTRFS_INODE_BTREE_IO_ERR 12
 +#define BTRFS_INODE_BTREE_LOG_IO_ERR 13
  
  /* in memory btrfs inode */
  struct btrfs_inode {
 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index 91f866c..33b113b 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -20,6 +20,7 @@
  #include locking.h
  #include rcu-string.h
  #include backref.h
 +#include transaction.h
  
  static struct kmem_cache *extent_state_cache;
  static struct kmem_cache *extent_buffer_cache;
 @@ -3606,6 +3607,68 @@ static void end_extent_buffer_writeback(struct 
 extent_buffer *eb)
   wake_up_bit(eb-bflags, EXTENT_BUFFER_WRITEBACK);
  }
  
 +static void set_btree_ioerr(struct page *page, int err)
 +{
 + struct extent_buffer *eb = (struct extent_buffer *)page-private;
 + const u64 start = eb-start;
 + const u64 end = eb-start + eb-len - 1;
 + struct btrfs_fs_info *fs_info = eb-fs_info;
 + int ret;
 +
 + set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
 + SetPageError(page);
 +
 + /*
 +  * If writeback for a btree extent that doesn't belong to a log tree
 + 

Re: Sparse Warnings about locks in extent-tree.c

2014-09-23 Thread Liu Bo
On Tue, Sep 23, 2014 at 04:01:41PM +0200, David Sterba wrote:
 On Tue, Sep 23, 2014 at 02:45:44PM +0800, Liu Bo wrote:
  On Mon, Sep 22, 2014 at 09:57:35PM -0400, nick wrote:
   Hello Btfs Developers,
   I am new so am unsure of how to fix this but we are hitting some sparse 
   warnings about unlock/lock is having 
   a wrong count when exiting certain functions in extent-tree.c. I will 
   paste the warnings below for you guys.
   Hope this is of help.
   Cheers,
   Nick  
   fs/btrfs/extent-tree.c:6386:39: warning: context imbalance in 
   'btrfs_lock_cluster' - wrong count at exit
   fs/btrfs/extent-tree.c:6663:44: warning: context imbalance in 
   'find_free_extent' - unexpected unlock
   fs/btrfs/extent-tree.c:8810:9: warning: context imbalance in 
   'btrfs_put_block_group_cache' - wrong count at exit
  
  What program is producing these warning?  It looks like a static code 
  analysis
  tool or something.
 
 https://sparse.wiki.kernel.org/index.php/Main_Page
 
 It understands a few semantics like the locking and reports when there's
 an unannotated function that has imbalanced locking, like the errors
 above.

Cool, thanks Dave.

-liubo

 
 Any static analyzer has some ratio of false positives so not every
 report means there's a bug. One would expect that a function
 'btrfs_lock_cluster' would act as a lock and leaves some locks held at
 exit. The fix is to annotate it a after review.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix crash of btrfs_release_extent_buffer_page

2014-09-23 Thread Liu Bo
This is actually inspired by Filipe's patch.  When write_one_eb() fails on
submit_extent_page(), it'll give up writing this eb and mark it with
EXTENT_BUFFER_IOERR.  So if it's not the last page that encounter the failure,
there are some left pages which remain DIRTY, and if a later COW on this eb
happens, ie. eb is COWed and freed, it'd run into BUG_ON in
btrfs_release_extent_buffer_page() for the DIRTY page, ie. 
BUG_ON(PageDirty(page));

This adds the missing clear_page_dirty_for_io() for the rest pages of eb.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index af0359d..e68a1d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3597,6 +3597,7 @@ static noinline_for_stack int write_one_eb(struct 
extent_buffer *eb,
if (unlikely(ret)) {
for (; i  num_pages; i++) {
struct page *p = extent_buffer_page(eb, i);
+   clear_page_dirty_for_io(p);
unlock_page(p);
}
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Tobias Holst
If it is unknown, which of these options have been used at btrfs
creation time - is it possible to check the state of these options
afterwards on a mounted or unmounted filesystem?


2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com:

 Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the 
 following list of features:
 mixed-bg- mixed data and metadata block groups
 extref  - increased hard-link limit per file to 65536
 raid56  - raid56 extended format
 skinny-metadata - reduced size metadata extent refs
 no-holes- no explicit hole extents for files
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: be aware of btree inode write errors to avoid fs corruption

2014-09-23 Thread Filipe David Manana
On Tue, Sep 23, 2014 at 3:05 PM, Liu Bo bo.li@oracle.com wrote:
 On Mon, Sep 22, 2014 at 05:41:05PM +0100, Filipe Manana wrote:
 While we have a transaction ongoing, the VM might decide at any time
 to call btree_inode-i_mapping-a_ops-writepages(), which will start
 writeback of dirty pages belonging to btree nodes/leafs. This call
 might return an error or the writeback might finish with an error
 before we attempt to commit the running transaction. If this happens,
 we might have no way of knowing that such error happened when we are
 committing the transaction - because the pages might no longer be
 marked dirty nor tagged for writeback (if a subsequent modification
 to the extent buffer didn't happen before the transaction commit) which
 makes filemap_fdata[write|wait]_range unable to find such pages (even
 if they're marked with SetPageError).
 So if this happens we must abort the transaction, otherwise we commit
 a super block with btree roots that point to btree nodes/leafs whose
 content on disk is invalid - either garbage or the content of some
 node/leaf from a past generation that got cowed or deleted and is no
 longer valid (for this later case we end up getting error messages like
 parent transid verify failed on 10826481664 wanted 25748 found 29562
 when reading btree nodes/leafs from disk).

 Good catch!


 Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
 i_mapping would not be enough because we need to distinguish between
 log tree extents (not fatal) vs non-log tree extents (fatal) and
 because the next call to filemap_fdatawait_range() will catch and clear
 such errors in the mapping - and that call might be from a log sync and
 not from a transaction commit, which means we would not know about the
 error at transaction commit time. Also, checking for the eb flag
 EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
 not be completely reliable, as the eb might be removed from memory and
 read back when trying to get it, which clears that flag right before
 reading the eb's pages from disk, making us not know about the previous
 write error.

 Using the BTRFS_INODE_BTREE_IO_ERR and BTRFS_INODE_BTREE_LOG_IO_ERR
 inode flags also makes us achieve the goal of AS_EIO/AS_ENOSPC when
 writepages() returns success, started writeback for all dirty pages
 and before filemap_fdatawait_range() is called, the writeback for
 all dirty pages had already finished with errors - because we were
 not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return
 success, as it could not know that writeback errors happened (the
 pages were no longer tagged for writeback).

 But there is a problem when the extent buffer with IO error is COWed and
 deleted, but the BTRFS_INODE_BTREE_IO_ERR flag still remains in
 btree_inode's runtime_flags, we'd still run into cleanup_transaction() as this
 patch expects.  So I think we need to remove that BTREE_IO_ERR flag in this
 case.

Thought about that, no good reason to not do it (yet at least). I'll
think a bit about it and send a v2 if I can't find a reason.


 Well, I notice that you don't clear rest pages' DIRTY in the error case,

Where exactly?

 so it
 can lead to hitting the BUG_ON in btrfs_release_extent_buffer_page() when the 
 eb
 with IO error is COWed by a new good eb and gets itself freed then.  I'm 
 making
 a patch to add missing clear_page_dirty_for_io().

clear_page_dirty_for_io() is already called by write_one_eb(), and
immediately after it tags the page(s) for writeback.
If error happens during writeback, end_bio_extent_buffer_writepage()
removes the writeback tag (from page and eb's flags).
So either I'm missing something, or you didn't notice all this.

thanks


 thanks,
 -liubo


 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
  fs/btrfs/btrfs_inode.h |  2 ++
  fs/btrfs/extent_io.c   | 69 
 +++---
  fs/btrfs/transaction.c | 20 ---
  fs/btrfs/transaction.h |  3 +--
  fs/btrfs/tree-log.c| 13 ++
  5 files changed, 93 insertions(+), 14 deletions(-)

 diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
 index 3511031..dbe37dc 100644
 --- a/fs/btrfs/btrfs_inode.h
 +++ b/fs/btrfs/btrfs_inode.h
 @@ -44,6 +44,8 @@
  #define BTRFS_INODE_IN_DELALLOC_LIST 9
  #define BTRFS_INODE_READDIO_NEED_LOCK10
  #define BTRFS_INODE_HAS_PROPS11
 +#define BTRFS_INODE_BTREE_IO_ERR 12
 +#define BTRFS_INODE_BTREE_LOG_IO_ERR 13

  /* in memory btrfs inode */
  struct btrfs_inode {
 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index 91f866c..33b113b 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -20,6 +20,7 @@
  #include locking.h
  #include rcu-string.h
  #include backref.h
 +#include transaction.h

  static struct kmem_cache *extent_state_cache;
  static struct kmem_cache *extent_buffer_cache;
 @@ -3606,6 +3607,68 @@ static void 

Re: [PATCH] Btrfs: fix crash of btrfs_release_extent_buffer_page

2014-09-23 Thread Filipe David Manana
On Tue, Sep 23, 2014 at 3:22 PM, Liu Bo bo.li@oracle.com wrote:
 This is actually inspired by Filipe's patch.  When write_one_eb() fails on
 submit_extent_page(), it'll give up writing this eb and mark it with
 EXTENT_BUFFER_IOERR.  So if it's not the last page that encounter the failure,
 there are some left pages which remain DIRTY, and if a later COW on this eb
 happens, ie. eb is COWed and freed, it'd run into BUG_ON in
 btrfs_release_extent_buffer_page() for the DIRTY page, ie. 
 BUG_ON(PageDirty(page));

 This adds the missing clear_page_dirty_for_io() for the rest pages of eb.

 Signed-off-by: Liu Bo bo.li@oracle.com

Looks good.
Thanks.

Reviewed-by: Filipe Manana fdman...@suse.com

 ---
  fs/btrfs/extent_io.c | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index af0359d..e68a1d9 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -3597,6 +3597,7 @@ static noinline_for_stack int write_one_eb(struct 
 extent_buffer *eb,
 if (unlikely(ret)) {
 for (; i  num_pages; i++) {
 struct page *p = extent_buffer_page(eb, i);
 +   clear_page_dirty_for_io(p);
 unlock_page(p);
 }
 }
 --
 1.8.1.4

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Austin S Hemmelgarn

On 2014-09-23 10:23, Tobias Holst wrote:

If it is unknown, which of these options have been used at btrfs
creation time - is it possible to check the state of these options
afterwards on a mounted or unmounted filesystem?


2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com
mailto:ahferro...@gmail.com:

Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives
the following list of features:
mixed-bg- mixed data and metadata block groups
extref  - increased hard-link limit per file to 65536
raid56  - raid56 extended format
skinny-metadata - reduced size metadata extent refs
no-holes- no explicit hole extents for files

I don't think there is a specific tool for doing this, but some of them 
do show up in dmesg, for example skinny-metadata shows up as a mention 
of the FS having skinny extents.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: fixes for btrfs check --repair

2014-09-23 Thread David Sterba
Hi,

On Sat, Aug 30, 2014 at 01:00:40PM -0300, Alexandre Oliva wrote:
 Here are the patches.

thanks, I've put them into the queue (branch with other fsck fixes).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] btrfs-progs: Add simple python front end to the search ioctl

2014-09-23 Thread Chris Mason

This is a starting point for a debugfs style python interface using
the search ioctl.  For now it can only do one thing, which is to
print out all the extents in a file and calculate the compression ratio.

Over time it will grow more features, especially for the kinds of things
we might run btrfs-debug-tree to find out.  Expect the usage and output
to change dramatically over time (don't hard code to it).

Signed-off-by: Chris Mason c...@fb.com
---
 btrfs-debugfs | 296 ++
 1 file changed, 296 insertions(+)
 create mode 100755 btrfs-debugfs

diff --git a/btrfs-debugfs b/btrfs-debugfs
new file mode 100755
index 000..cf1d285
--- /dev/null
+++ b/btrfs-debugfs
@@ -0,0 +1,296 @@
+#!/usr/bin/env python2
+#
+# Simple python program to print out all the extents of a single file
+# LGPLv2 license
+# Copyright Facebook 2014
+
+import sys,os,struct,fcntl,ctypes,stat
+
+# helpers for max ints
+maxu64 = (1L  64) - 1
+maxu32 = (1L  32) - 1
+
+# the inode (like form stat)
+BTRFS_INODE_ITEM_KEY = 1
+# backref to the directory
+BTRFS_INODE_REF_KEY = 12
+# backref to the directory v2
+BTRFS_INODE_EXTREF_KEY = 13
+# xattr items
+BTRFS_XATTR_ITEM_KEY = 24
+# orphans for list files
+BTRFS_ORPHAN_ITEM_KEY = 48
+# treelog items for dirs
+BTRFS_DIR_LOG_ITEM_KEY = 60
+BTRFS_DIR_LOG_INDEX_KEY = 72
+# dir items and dir indexes both hold filenames
+BTRFS_DIR_ITEM_KEY = 84
+BTRFS_DIR_INDEX_KEY = 96
+# these are the file extent pointers
+BTRFS_EXTENT_DATA_KEY = 108
+# csums
+BTRFS_EXTENT_CSUM_KEY = 128
+# root item for subvols and snapshots
+BTRFS_ROOT_ITEM_KEY = 132
+# root item backrefs
+BTRFS_ROOT_BACKREF_KEY = 144
+BTRFS_ROOT_REF_KEY = 156
+# each allocated extent has an extent item
+BTRFS_EXTENT_ITEM_KEY = 168
+# optimized extents for metadata only
+BTRFS_METADATA_ITEM_KEY = 169
+# backrefs for extents
+BTRFS_TREE_BLOCK_REF_KEY = 176
+BTRFS_EXTENT_DATA_REF_KEY = 178
+BTRFS_EXTENT_REF_V0_KEY = 180
+BTRFS_SHARED_BLOCK_REF_KEY = 182
+BTRFS_SHARED_DATA_REF_KEY = 184
+# one of these for each block group
+BTRFS_BLOCK_GROUP_ITEM_KEY = 192
+# dev extents records which part of each device is allocated
+BTRFS_DEV_EXTENT_KEY = 204
+# dev items describe devs
+BTRFS_DEV_ITEM_KEY = 216
+# one for each chunk
+BTRFS_CHUNK_ITEM_KEY = 228
+# qgroup info
+BTRFS_QGROUP_STATUS_KEY = 240
+BTRFS_QGROUP_INFO_KEY = 242
+BTRFS_QGROUP_LIMIT_KEY = 244
+BTRFS_QGROUP_RELATION_KEY = 246
+# records balance progress
+BTRFS_BALANCE_ITEM_KEY = 248
+# stats on device errors
+BTRFS_DEV_STATS_KEY = 249
+BTRFS_DEV_REPLACE_KEY = 250
+BTRFS_STRING_ITEM_KEY = 253
+
+# in the kernel sources, this is flattened
+# btrfs_ioctl_search_args_v2.  It includes both the btrfs_ioctl_search_key
+# and the buffer.  We're using a 64K buffer size.
+#
+args_buffer_size = 65536
+class btrfs_ioctl_search_args(ctypes.Structure):
+_pack_ = 1
+_fields_ = [ (tree_id, ctypes.c_ulonglong),
+ (min_objectid, ctypes.c_ulonglong),
+ (max_objectid, ctypes.c_ulonglong),
+ (min_offset, ctypes.c_ulonglong),
+ (max_offset, ctypes.c_ulonglong),
+ (min_transid, ctypes.c_ulonglong),
+ (max_transid, ctypes.c_ulonglong),
+ (min_type, ctypes.c_uint),
+ (max_type, ctypes.c_uint),
+ (nr_items, ctypes.c_uint),
+ (unused, ctypes.c_uint),
+ (unused1, ctypes.c_ulonglong),
+ (unused2, ctypes.c_ulonglong),
+ (unused3, ctypes.c_ulonglong),
+ (unused4, ctypes.c_ulonglong),
+ (buf_size, ctypes.c_ulonglong),
+ (buf, ctypes.c_ubyte * args_buffer_size),
+   ]
+
+# the search ioctl resturns one header for each item
+#
+class btrfs_ioctl_search_header(ctypes.Structure):
+_pack_ = 1
+_fields_ = [ (transid, ctypes.c_ulonglong),
+ (objectid, ctypes.c_ulonglong),
+ (offset, ctypes.c_ulonglong),
+ (type, ctypes.c_uint),
+ (len, ctypes.c_uint),
+   ]
+
+# the type field in btrfs_file_extent_item
+BTRFS_FILE_EXTENT_INLINE = 0
+BTRFS_FILE_EXTENT_REG = 1
+BTRFS_FILE_EXTENT_PREALLOC = 2
+
+class btrfs_file_extent_item(ctypes.LittleEndianStructure):
+_pack_ = 1
+_fields_ = [ (generation, ctypes.c_ulonglong),
+ (ram_bytes, ctypes.c_ulonglong),
+ (compression, ctypes.c_ubyte),
+ (encryption, ctypes.c_ubyte),
+ (other_encoding, ctypes.c_ubyte * 2),
+ (type, ctypes.c_ubyte),
+ (disk_bytenr, ctypes.c_ulonglong),
+ (disk_num_bytes, ctypes.c_ulonglong),
+ (offset, ctypes.c_ulonglong),
+ (num_bytes, ctypes.c_ulonglong),
+   ]
+
+class btrfs_ioctl_search():
+def __init__(self):
+self.args = btrfs_ioctl_search_args()
+self.args.tree_id = 0
+

Re: [PATCH RFC] btrfs-progs: Add simple python front end to the search ioctl

2014-09-23 Thread cwillu
On Tue, Sep 23, 2014 at 10:39 AM, Chris Mason c...@fb.com wrote:

 This is a starting point for a debugfs style python interface using
 the search ioctl.  For now it can only do one thing, which is to
 print out all the extents in a file and calculate the compression ratio.

 Over time it will grow more features, especially for the kinds of things
 we might run btrfs-debug-tree to find out.  Expect the usage and output
 to change dramatically over time (don't hard code to it).

 Signed-off-by: Chris Mason c...@fb.com
 ---
  btrfs-debugfs | 296 
 ++
  1 file changed, 296 insertions(+)
  create mode 100755 btrfs-debugfs

 diff --git a/btrfs-debugfs b/btrfs-debugfs
 new file mode 100755
 index 000..cf1d285
 --- /dev/null
 +++ b/btrfs-debugfs
 @@ -0,0 +1,296 @@
 +#!/usr/bin/env python2
 +#
 +# Simple python program to print out all the extents of a single file
 +# LGPLv2 license
 +# Copyright Facebook 2014
 +
 +import sys,os,struct,fcntl,ctypes,stat
 +
 +# helpers for max ints
 +maxu64 = (1L  64) - 1
 +maxu32 = (1L  32) - 1
 +
 +# the inode (like form stat)
 +BTRFS_INODE_ITEM_KEY = 1
 +# backref to the directory
 +BTRFS_INODE_REF_KEY = 12
 +# backref to the directory v2
 +BTRFS_INODE_EXTREF_KEY = 13
 +# xattr items
 +BTRFS_XATTR_ITEM_KEY = 24
 +# orphans for list files
 +BTRFS_ORPHAN_ITEM_KEY = 48
 +# treelog items for dirs
 +BTRFS_DIR_LOG_ITEM_KEY = 60
 +BTRFS_DIR_LOG_INDEX_KEY = 72
 +# dir items and dir indexes both hold filenames
 +BTRFS_DIR_ITEM_KEY = 84
 +BTRFS_DIR_INDEX_KEY = 96
 +# these are the file extent pointers
 +BTRFS_EXTENT_DATA_KEY = 108
 +# csums
 +BTRFS_EXTENT_CSUM_KEY = 128
 +# root item for subvols and snapshots
 +BTRFS_ROOT_ITEM_KEY = 132
 +# root item backrefs
 +BTRFS_ROOT_BACKREF_KEY = 144
 +BTRFS_ROOT_REF_KEY = 156
 +# each allocated extent has an extent item
 +BTRFS_EXTENT_ITEM_KEY = 168
 +# optimized extents for metadata only
 +BTRFS_METADATA_ITEM_KEY = 169
 +# backrefs for extents
 +BTRFS_TREE_BLOCK_REF_KEY = 176
 +BTRFS_EXTENT_DATA_REF_KEY = 178
 +BTRFS_EXTENT_REF_V0_KEY = 180
 +BTRFS_SHARED_BLOCK_REF_KEY = 182
 +BTRFS_SHARED_DATA_REF_KEY = 184
 +# one of these for each block group
 +BTRFS_BLOCK_GROUP_ITEM_KEY = 192
 +# dev extents records which part of each device is allocated
 +BTRFS_DEV_EXTENT_KEY = 204
 +# dev items describe devs
 +BTRFS_DEV_ITEM_KEY = 216
 +# one for each chunk
 +BTRFS_CHUNK_ITEM_KEY = 228
 +# qgroup info
 +BTRFS_QGROUP_STATUS_KEY = 240
 +BTRFS_QGROUP_INFO_KEY = 242
 +BTRFS_QGROUP_LIMIT_KEY = 244
 +BTRFS_QGROUP_RELATION_KEY = 246
 +# records balance progress
 +BTRFS_BALANCE_ITEM_KEY = 248
 +# stats on device errors
 +BTRFS_DEV_STATS_KEY = 249
 +BTRFS_DEV_REPLACE_KEY = 250
 +BTRFS_STRING_ITEM_KEY = 253
 +
 +# in the kernel sources, this is flattened
 +# btrfs_ioctl_search_args_v2.  It includes both the btrfs_ioctl_search_key
 +# and the buffer.  We're using a 64K buffer size.
 +#
 +args_buffer_size = 65536
 +class btrfs_ioctl_search_args(ctypes.Structure):

Put comments like these in triple-quoted strings just inside the class
or function you're defining; this makes them accessible using the
standard help() system:

class foo(bar):

In the kernel sources, this is

 +_pack_ = 1
 +_fields_ = [ (tree_id, ctypes.c_ulonglong),
 + (min_objectid, ctypes.c_ulonglong),
 + (max_objectid, ctypes.c_ulonglong),
 + (min_offset, ctypes.c_ulonglong),
 + (max_offset, ctypes.c_ulonglong),
 + (min_transid, ctypes.c_ulonglong),
 + (max_transid, ctypes.c_ulonglong),
 + (min_type, ctypes.c_uint),
 + (max_type, ctypes.c_uint),
 + (nr_items, ctypes.c_uint),
 + (unused, ctypes.c_uint),
 + (unused1, ctypes.c_ulonglong),
 + (unused2, ctypes.c_ulonglong),
 + (unused3, ctypes.c_ulonglong),
 + (unused4, ctypes.c_ulonglong),
 + (buf_size, ctypes.c_ulonglong),
 + (buf, ctypes.c_ubyte * args_buffer_size),
 +   ]
 +
 +# the search ioctl resturns one header for each item


 +class btrfs_ioctl_search_header(ctypes.Structure):
 +_pack_ = 1
 +_fields_ = [ (transid, ctypes.c_ulonglong),
 + (objectid, ctypes.c_ulonglong),
 + (offset, ctypes.c_ulonglong),
 + (type, ctypes.c_uint),
 + (len, ctypes.c_uint),
 +   ]
 +
 +# the type field in btrfs_file_extent_item
 +BTRFS_FILE_EXTENT_INLINE = 0
 +BTRFS_FILE_EXTENT_REG = 1
 +BTRFS_FILE_EXTENT_PREALLOC = 2
 +
 +class btrfs_file_extent_item(ctypes.LittleEndianStructure):
 +_pack_ = 1
 +_fields_ = [ (generation, ctypes.c_ulonglong),
 + (ram_bytes, ctypes.c_ulonglong),
 + (compression, ctypes.c_ubyte),
 + (encryption, ctypes.c_ubyte),
 + (other_encoding, ctypes.c_ubyte * 2),
 

Re: [PATCH RFC] btrfs-progs: Add simple python front end to the search ioctl

2014-09-23 Thread cwillu
Damn you gmail...
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] btrfs-progs: scan /proc/partitions not all of /dev with -d

2014-09-23 Thread David Sterba
Hi,

all 5 patches will be in the next integration. I haven't tested them
yet, seems it's a bit more important to make a more stable devel base
for more updates you might want to send.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.

2014-09-23 Thread Eric Sandeen
On 9/23/14 7:49 AM, Chris Mason wrote:
 On 09/23/2014 01:40 AM, Qu Wenruo wrote:
 [BUG]
 Originally when mount btrfs with -o subvol= mount option, btrfs will
 lose all security lable.
 And if the btrfs fs is mounted somewhere else, due to the lost of
 security lable, SELinux will refuse to mount since the same super block
 is being mounted using different security lable.

 [REPRODUCER]
 With SELinux enabled:
  #mkfs -t btrfs /dev/sda5
  #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
  #btrfs subvolume create /mnt/btrfs/subvol
  #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5
   /mnt/test

 kernel message:
 SELinux: mount invalid.  Same superblock, different security settings
 for (dev sda5, type btrfs)

 [REASON]
 This happens because btrfs will call vfs_kern_mount() and then
 mount_subtree() to handle subvolume name lookup.
 First mount will cut off all the security lables and when it comes to
 the second vfs_kern_mount(), it has no security label now.

 [FIX]
 This patch will makes btrfs behavior much more like nfs,
 which has the type flag FS_BINARY_MOUNTDATA,
 making btrfs handles the security label internally.
 So security label will be set in the real mount time and won't lose
 label when use with subvol= mount option.
 
 Thanks for working on this.  Eric Sandeen (cc'd) was trying out
 something similar recently, so I want to make sure this doesn't conflict
 with his ideas.

My ideas didn't get very far.  ;)

What I was after was a way for multiple subvolumes to have unique contexts.
It looks like this might do the trick, as long as they are mounted on a unique
mount point.

Would this allow subvolume create to take a context, so that everything
under /mnt/btrfs/subvol/ has a unique subvol-wide context?

thanks,
-Eric
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v2] Btrfs: be aware of btree inode write errors to avoid fs corruption

2014-09-23 Thread Filipe Manana
While we have a transaction ongoing, the VM might decide at any time
to call btree_inode-i_mapping-a_ops-writepages(), which will start
writeback of dirty pages belonging to btree nodes/leafs. This call
might return an error or the writeback might finish with an error
before we attempt to commit the running transaction. If this happens,
we might have no way of knowing that such error happened when we are
committing the transaction - because the pages might no longer be
marked dirty nor tagged for writeback (if a subsequent modification
to the extent buffer didn't happen before the transaction commit) which
makes filemap_fdata[write|wait]_range unable to find such pages (even
if they're marked with SetPageError).
So if this happens we must abort the transaction, otherwise we commit
a super block with btree roots that point to btree nodes/leafs whose
content on disk is invalid - either garbage or the content of some
node/leaf from a past generation that got cowed or deleted and is no
longer valid (for this later case we end up getting error messages like
parent transid verify failed on 10826481664 wanted 25748 found 29562
when reading btree nodes/leafs from disk).

Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
i_mapping would not be enough because we need to distinguish between
log tree extents (not fatal) vs non-log tree extents (fatal) and
because the next call to filemap_fdatawait_range() will catch and clear
such errors in the mapping - and that call might be from a log sync and
not from a transaction commit, which means we would not know about the
error at transaction commit time. Also, checking for the eb flag
EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
not be completely reliable, as the eb might be removed from memory and
read back when trying to get it, which clears that flag right before
reading the eb's pages from disk, making us not know about the previous
write error.

Using the new counters eb_write_errors and log_eb_write_errors in the
transaction also makes us achieve the goal of AS_EIO/AS_ENOSPC when
writepages() returns success, started writeback for all dirty pages
and before filemap_fdatawait_range() is called, the writeback for
all dirty pages had already finished with errors - because we were
not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return
success, as it could not know that writeback errors happened (the
pages were no longer tagged for writeback).

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: If an extent buffer's write failed but it's also deleted from the tree
before the transaction commits, don't abort the transaction with -EIO,
since the unwritten node/leaf it represents can't be pointed to by any
other node in a tree.

 fs/btrfs/disk-io.c |  4 +--
 fs/btrfs/extent-tree.c |  8 ++
 fs/btrfs/extent_io.c   | 76 +-
 fs/btrfs/extent_io.h   |  3 +-
 fs/btrfs/transaction.c | 22 +--
 fs/btrfs/transaction.h |  5 ++--
 fs/btrfs/tree-log.c| 13 +
 7 files changed, 111 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 23393ec..8b54acf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -610,7 +610,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
goto err;
 
eb-read_mirror = mirror;
-   if (test_bit(EXTENT_BUFFER_IOERR, eb-bflags)) {
+   if (test_bit(EXTENT_BUFFER_READ_ERR, eb-bflags)) {
ret = -EIO;
goto err;
}
@@ -683,7 +683,7 @@ static int btree_io_failed_hook(struct page *page, int 
failed_mirror)
struct btrfs_root *root = BTRFS_I(page-mapping-host)-root;
 
eb = (struct extent_buffer *)page-private;
-   set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
+   set_bit(EXTENT_BUFFER_READ_ERR, eb-bflags);
eb-read_mirror = failed_mirror;
atomic_dec(eb-io_pages);
if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags))
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ef0845d..608814b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6221,6 +6221,14 @@ out:
 * anymore.
 */
clear_bit(EXTENT_BUFFER_CORRUPT, buf-bflags);
+   /*
+* The unwritten node/leaf (due to an IO error) isn't pointed to by any
+* other node in a tree, so it's safe to forget about the write error
+* and avoid a transaction abort.
+*/
+   if (test_and_clear_bit(EXTENT_BUFFER_WRITE_ERR, buf-bflags))
+   atomic_dec(trans-transaction-eb_write_errors);
+
btrfs_put_block_group(cache);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 91f866c..e21f200 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -20,6 +20,7 @@
 #include locking.h
 #include rcu-string.h
 #include backref.h
+#include transaction.h
 
 static struct kmem_cache 

[PATCH 2/2 v3] Btrfs: be aware of btree inode write errors to avoid fs corruption

2014-09-23 Thread Filipe Manana
While we have a transaction ongoing, the VM might decide at any time
to call btree_inode-i_mapping-a_ops-writepages(), which will start
writeback of dirty pages belonging to btree nodes/leafs. This call
might return an error or the writeback might finish with an error
before we attempt to commit the running transaction. If this happens,
we might have no way of knowing that such error happened when we are
committing the transaction - because the pages might no longer be
marked dirty nor tagged for writeback (if a subsequent modification
to the extent buffer didn't happen before the transaction commit) which
makes filemap_fdata[write|wait]_range unable to find such pages (even
if they're marked with SetPageError).
So if this happens we must abort the transaction, otherwise we commit
a super block with btree roots that point to btree nodes/leafs whose
content on disk is invalid - either garbage or the content of some
node/leaf from a past generation that got cowed or deleted and is no
longer valid (for this later case we end up getting error messages like
parent transid verify failed on 10826481664 wanted 25748 found 29562
when reading btree nodes/leafs from disk).

Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
i_mapping would not be enough because we need to distinguish between
log tree extents (not fatal) vs non-log tree extents (fatal) and
because the next call to filemap_fdatawait_range() will catch and clear
such errors in the mapping - and that call might be from a log sync and
not from a transaction commit, which means we would not know about the
error at transaction commit time. Also, checking for the eb flag
EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
not be completely reliable, as the eb might be removed from memory and
read back when trying to get it, which clears that flag right before
reading the eb's pages from disk, making us not know about the previous
write error.

Using the new counters eb_write_errors and log_eb_write_errors in the
transaction also makes us achieve the goal of AS_EIO/AS_ENOSPC when
writepages() returns success, started writeback for all dirty pages
and before filemap_fdatawait_range() is called, the writeback for
all dirty pages had already finished with errors - because we were
not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return
success, as it could not know that writeback errors happened (the
pages were no longer tagged for writeback).

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: If an extent buffer's write failed but it's also deleted from the tree
before the transaction commits, don't abort the transaction with -EIO,
since the unwritten node/leaf it represents can't be pointed to by any
other node in a tree.

V3: Correct V2, missed unstaged changes.

 fs/btrfs/disk-io.c |  4 +--
 fs/btrfs/extent-tree.c |  8 ++
 fs/btrfs/extent_io.c   | 76 +-
 fs/btrfs/extent_io.h   |  3 +-
 fs/btrfs/transaction.c | 22 +--
 fs/btrfs/transaction.h |  5 ++--
 fs/btrfs/tree-log.c| 13 +
 7 files changed, 111 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 23393ec..8b54acf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -610,7 +610,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
goto err;
 
eb-read_mirror = mirror;
-   if (test_bit(EXTENT_BUFFER_IOERR, eb-bflags)) {
+   if (test_bit(EXTENT_BUFFER_READ_ERR, eb-bflags)) {
ret = -EIO;
goto err;
}
@@ -683,7 +683,7 @@ static int btree_io_failed_hook(struct page *page, int 
failed_mirror)
struct btrfs_root *root = BTRFS_I(page-mapping-host)-root;
 
eb = (struct extent_buffer *)page-private;
-   set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
+   set_bit(EXTENT_BUFFER_READ_ERR, eb-bflags);
eb-read_mirror = failed_mirror;
atomic_dec(eb-io_pages);
if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags))
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ef0845d..608814b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6221,6 +6221,14 @@ out:
 * anymore.
 */
clear_bit(EXTENT_BUFFER_CORRUPT, buf-bflags);
+   /*
+* The unwritten node/leaf (due to an IO error) isn't pointed to by any
+* other node in a tree, so it's safe to forget about the write error
+* and avoid a transaction abort.
+*/
+   if (test_and_clear_bit(EXTENT_BUFFER_WRITE_ERR, buf-bflags))
+   atomic_dec(trans-transaction-eb_write_errors);
+
btrfs_put_block_group(cache);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 91f866c..da1706f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -20,6 +20,7 @@
 #include locking.h
 #include rcu-string.h
 #include backref.h
+#include transaction.h
 

Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.

2014-09-23 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options 
internally to avoid losing security label.

From: Eric Sandeen sand...@redhat.com
To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com, 
linux-btrfs@vger.kernel.org

Date: 2014年09月24日 02:51

On 9/23/14 7:49 AM, Chris Mason wrote:

On 09/23/2014 01:40 AM, Qu Wenruo wrote:

[BUG]
Originally when mount btrfs with -o subvol= mount option, btrfs will
lose all security lable.
And if the btrfs fs is mounted somewhere else, due to the lost of
security lable, SELinux will refuse to mount since the same super block
is being mounted using different security lable.

[REPRODUCER]
With SELinux enabled:
  #mkfs -t btrfs /dev/sda5
  #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
  #btrfs subvolume create /mnt/btrfs/subvol
  #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5
   /mnt/test

kernel message:
SELinux: mount invalid.  Same superblock, different security settings
for (dev sda5, type btrfs)

[REASON]
This happens because btrfs will call vfs_kern_mount() and then
mount_subtree() to handle subvolume name lookup.
First mount will cut off all the security lables and when it comes to
the second vfs_kern_mount(), it has no security label now.

[FIX]
This patch will makes btrfs behavior much more like nfs,
which has the type flag FS_BINARY_MOUNTDATA,
making btrfs handles the security label internally.
So security label will be set in the real mount time and won't lose
label when use with subvol= mount option.

Thanks for working on this.  Eric Sandeen (cc'd) was trying out
something similar recently, so I want to make sure this doesn't conflict
with his ideas.

My ideas didn't get very far.  ;)

What I was after was a way for multiple subvolumes to have unique contexts.
It looks like this might do the trick, as long as they are mounted on a unique
mount point.

Would this allow subvolume create to take a context, so that everything
under /mnt/btrfs/subvol/ has a unique subvol-wide context?

thanks,
-Eric

Did you mean the following situation?
/dev/sdb default subvol(FS_TREE) mounted on /mnt/default with context A
/dev/sdb subvol=subvol mounted on /mnt/subvol with context B

If that's your goal, I am afraid that my patch can't achieve it and even 
worse, will even forbid it. :(


SELinux doesn't allow same superblock mounted with different context, 
and the patch follows it.
If SELinux is modified to allow same superblock different context, then 
my patch also needs to be modified.


Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-23 Thread Qu Wenruo


 Original Message 
Subject: Re: general thoughts and questions + general and RAID5/6 stability?
From: Tobias Holst to...@tobby.eu
To: linux-btrfs@vger.kernel.org
Date: 2014年09月23日 22:24

If it is unknown, which of these options have been used at btrfs
creation time - is it possible to check the state of these options
afterwards on a mounted or unmounted filesystem?

For mounted fs, sysfs can be used to see the feature enabled:
/sys/fs/btrfs/UUID/features/

For unmounted fs, maybe not the best one, btrfs-show-super can show the 
incompat_flags in hex,
and we can check kernel tree/fs/btrfs/ctree.h for 
BTRFS_FEATURE_INCOMPAT_## for the bits

and calculate by hand...
(Would it be better to add human readable output for btrfs-show-super?)

Thanks,
Qu



2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com:

Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the 
following list of features:
mixed-bg- mixed data and metadata block groups
extref  - increased hard-link limit per file to 65536
raid56  - raid56 extended format
skinny-metadata - reduced size metadata extent refs
no-holes- no explicit hole extents for files

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help Me Learn Btrfs Code Base

2014-09-23 Thread nick
Hello Guys,
I am wondering if anybody here would like to mentor me in the btrfs code base 
in if they have
any free time. I am not asking for much just someone to answer my questions and 
help me learn
the code base well enough to help out.
Thanks for Any Help,
Nick
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Add human readable incompat flags output for btrfs-show-super

2014-09-23 Thread Qu Wenruo
Add human readable incompat flags output for btrfs-show-super,
now no longer needs to calculate the hex flags by hand.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 btrfs-show-super.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/btrfs-show-super.c b/btrfs-show-super.c
index 38c5d26..c591865 100644
--- a/btrfs-show-super.c
+++ b/btrfs-show-super.c
@@ -285,6 +285,58 @@ static void print_backup_roots(struct btrfs_super_block 
*sb)
}
 }
 
+struct readable_flag_entry {
+   u64 bit;
+   char *output;
+};
+
+#define DEF_INCOMPAT_FLAG_ENTRY(bit_name)  \
+   {BTRFS_FEATURE_INCOMPAT_##bit_name, #bit_name}
+
+struct readable_flag_entry incompat_flags_array[] = {
+   DEF_INCOMPAT_FLAG_ENTRY(MIXED_BACKREF),
+   DEF_INCOMPAT_FLAG_ENTRY(DEFAULT_SUBVOL),
+   DEF_INCOMPAT_FLAG_ENTRY(MIXED_GROUPS),
+   DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZO),
+   DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZOv2),
+   DEF_INCOMPAT_FLAG_ENTRY(BIG_METADATA),
+   DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_IREF),
+   DEF_INCOMPAT_FLAG_ENTRY(RAID56),
+   DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA),
+   DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES)
+};
+static const int incompat_flags_num = sizeof(incompat_flags_array) /
+ sizeof(struct readable_flag_entry);
+
+static void print_readable_incompat_flag(u64 flag)
+{
+   int i;
+   int first = 1;
+   struct readable_flag_entry *entry;
+
+   if (!flag)
+   return;
+   printf(\t\t\t( );
+   for (i = 0; i  incompat_flags_num; i++) {
+   entry = incompat_flags_array + i;
+   if (flag  entry-bit) {
+   if (first)
+   printf(%s , entry-output);
+   else
+   printf(|\n\t\t\t  %s , entry-output);
+   }
+   first = 0;
+   }
+   flag = ~BTRFS_FEATURE_INCOMPAT_SUPP;
+   if (flag) {
+   if (first)
+   printf(unknown flag:%llx , flag);
+   else
+   printf(|\n\t\t\t  unknown flag:%llx , flag);
+   }
+   printf()\n);
+}
+
 static void dump_superblock(struct btrfs_super_block *sb, int full)
 {
int i;
@@ -364,6 +416,7 @@ static void dump_superblock(struct btrfs_super_block *sb, 
int full)
   (unsigned long long)btrfs_super_compat_ro_flags(sb));
printf(incompat_flags\t\t0x%llx\n,
   (unsigned long long)btrfs_super_incompat_flags(sb));
+   print_readable_incompat_flag(btrfs_super_incompat_flags(sb));
printf(csum_type\t\t%llu\n,
   (unsigned long long)btrfs_super_csum_type(sb));
printf(csum_size\t\t%llu\n,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.

2014-09-23 Thread Eric Sandeen
On 9/23/14 7:31 PM, Qu Wenruo wrote:
 
  Original Message 
 Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options 
 internally to avoid losing security label.
 From: Eric Sandeen sand...@redhat.com
 To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com, 
 linux-btrfs@vger.kernel.org
 Date: 2014年09月24日 02:51
 On 9/23/14 7:49 AM, Chris Mason wrote:
 On 09/23/2014 01:40 AM, Qu Wenruo wrote:
 [BUG]
 Originally when mount btrfs with -o subvol= mount option, btrfs will
 lose all security lable.
 And if the btrfs fs is mounted somewhere else, due to the lost of
 security lable, SELinux will refuse to mount since the same super block
 is being mounted using different security lable.

 [REPRODUCER]
 With SELinux enabled:
   #mkfs -t btrfs /dev/sda5
   #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
   #btrfs subvolume create /mnt/btrfs/subvol
   #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5
/mnt/test

 kernel message:
 SELinux: mount invalid.  Same superblock, different security settings
 for (dev sda5, type btrfs)

 [REASON]
 This happens because btrfs will call vfs_kern_mount() and then
 mount_subtree() to handle subvolume name lookup.
 First mount will cut off all the security lables and when it comes to
 the second vfs_kern_mount(), it has no security label now.

 [FIX]
 This patch will makes btrfs behavior much more like nfs,
 which has the type flag FS_BINARY_MOUNTDATA,
 making btrfs handles the security label internally.
 So security label will be set in the real mount time and won't lose
 label when use with subvol= mount option.
 Thanks for working on this.  Eric Sandeen (cc'd) was trying out
 something similar recently, so I want to make sure this doesn't conflict
 with his ideas.
 My ideas didn't get very far.  ;)

 What I was after was a way for multiple subvolumes to have unique contexts.
 It looks like this might do the trick, as long as they are mounted on a 
 unique
 mount point.

 Would this allow subvolume create to take a context, so that everything
 under /mnt/btrfs/subvol/ has a unique subvol-wide context?

 thanks,
 -Eric
 Did you mean the following situation?
 /dev/sdb default subvol(FS_TREE) mounted on /mnt/default with context A
 /dev/sdb subvol=subvol mounted on /mnt/subvol with context B
 
 If that's your goal, I am afraid that my patch can't achieve it and even 
 worse, will even forbid it. :(
 
 SELinux doesn't allow same superblock mounted with different context, and the 
 patch follows it.
 If SELinux is modified to allow same superblock different context, then my 
 patch also needs to be modified.

oh, ok, I see.

I don't think that my wish should disallow your patch.

For the problem I was looking at, I think the only way forward would require
some significant selinux modification, and treating a subvol root essentially
like a superblock.

So ... don't let me slow you down, at least for now.  ;)

-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Make btrfs handle security mount options internally to avoid losing security label.

2014-09-23 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options 
internally to avoid losing security label.

From: Eric Sandeen sand...@redhat.com
To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com, 
linux-btrfs@vger.kernel.org

Date: 2014年09月24日 11:33

On 9/23/14 7:31 PM, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Make btrfs handle security mount options internally 
to avoid losing security label.
From: Eric Sandeen sand...@redhat.com
To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com, 
linux-btrfs@vger.kernel.org
Date: 2014年09月24日 02:51

On 9/23/14 7:49 AM, Chris Mason wrote:

On 09/23/2014 01:40 AM, Qu Wenruo wrote:

[BUG]
Originally when mount btrfs with -o subvol= mount option, btrfs will
lose all security lable.
And if the btrfs fs is mounted somewhere else, due to the lost of
security lable, SELinux will refuse to mount since the same super block
is being mounted using different security lable.

[REPRODUCER]
With SELinux enabled:
   #mkfs -t btrfs /dev/sda5
   #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
   #btrfs subvolume create /mnt/btrfs/subvol
   #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5
/mnt/test

kernel message:
SELinux: mount invalid.  Same superblock, different security settings
for (dev sda5, type btrfs)

[REASON]
This happens because btrfs will call vfs_kern_mount() and then
mount_subtree() to handle subvolume name lookup.
First mount will cut off all the security lables and when it comes to
the second vfs_kern_mount(), it has no security label now.

[FIX]
This patch will makes btrfs behavior much more like nfs,
which has the type flag FS_BINARY_MOUNTDATA,
making btrfs handles the security label internally.
So security label will be set in the real mount time and won't lose
label when use with subvol= mount option.

Thanks for working on this.  Eric Sandeen (cc'd) was trying out
something similar recently, so I want to make sure this doesn't conflict
with his ideas.

My ideas didn't get very far.  ;)

What I was after was a way for multiple subvolumes to have unique contexts.
It looks like this might do the trick, as long as they are mounted on a unique
mount point.

Would this allow subvolume create to take a context, so that everything
under /mnt/btrfs/subvol/ has a unique subvol-wide context?

thanks,
-Eric

Did you mean the following situation?
/dev/sdb default subvol(FS_TREE) mounted on /mnt/default with context A
/dev/sdb subvol=subvol mounted on /mnt/subvol with context B

If that's your goal, I am afraid that my patch can't achieve it and even worse, 
will even forbid it. :(

SELinux doesn't allow same superblock mounted with different context, and the 
patch follows it.
If SELinux is modified to allow same superblock different context, then my 
patch also needs to be modified.

oh, ok, I see.

I don't think that my wish should disallow your patch.

For the problem I was looking at, I think the only way forward would require
some significant selinux modification, and treating a subvol root essentially
like a superblock.

So ... don't let me slow you down, at least for now.  ;)

-Eric

BTW, since with the patch btrfs can in fact don't call 
security_sb_set_mnt_opts() if btrfs wants,
what about set context to the dentry or something like that, but not set 
to superblock?

I may be wrong, since I am still not famaliar with security parts...

Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v4] Btrfs: be aware of btree inode write errors to avoid fs corruption

2014-09-23 Thread Filipe Manana
While we have a transaction ongoing, the VM might decide at any time
to call btree_inode-i_mapping-a_ops-writepages(), which will start
writeback of dirty pages belonging to btree nodes/leafs. This call
might return an error or the writeback might finish with an error
before we attempt to commit the running transaction. If this happens,
we might have no way of knowing that such error happened when we are
committing the transaction - because the pages might no longer be
marked dirty nor tagged for writeback (if a subsequent modification
to the extent buffer didn't happen before the transaction commit) which
makes filemap_fdata[write|wait]_range unable to find such pages (even
if they're marked with SetPageError).
So if this happens we must abort the transaction, otherwise we commit
a super block with btree roots that point to btree nodes/leafs whose
content on disk is invalid - either garbage or the content of some
node/leaf from a past generation that got cowed or deleted and is no
longer valid (for this later case we end up getting error messages like
parent transid verify failed on 10826481664 wanted 25748 found 29562
when reading btree nodes/leafs from disk).

Note that setting and checking AS_EIO/AS_ENOSPC in the btree inode's
i_mapping would not be enough because we need to distinguish between
log tree extents (not fatal) vs non-log tree extents (fatal) and
because the next call to filemap_fdatawait_range() will catch and clear
such errors in the mapping - and that call might be from a log sync and
not from a transaction commit, which means we would not know about the
error at transaction commit time. Also, checking for the eb flag
EXTENT_BUFFER_IOERR at transaction commit time isn't done and would
not be completely reliable, as the eb might be removed from memory and
read back when trying to get it, which clears that flag right before
reading the eb's pages from disk, making us not know about the previous
write error.

Using the new counters eb_write_errors and log_eb_write_errors in the
transaction also makes us achieve the goal of AS_EIO/AS_ENOSPC when
writepages() returns success, started writeback for all dirty pages
and before filemap_fdatawait_range() is called, the writeback for
all dirty pages had already finished with errors - because we were
not using AS_EIO/AS_ENOSPC, filemap_fdatawait_range() would return
success, as it could not know that writeback errors happened (the
pages were no longer tagged for writeback).

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: If an extent buffer's write failed but it's also deleted from the tree
before the transaction commits, don't abort the transaction with -EIO,
since the unwritten node/leaf it represents can't be pointed to by any
other node in a tree.

V3: Correct V2, missed unstaged changes.

V4: Use root's key to figure out which counter to update.

 fs/btrfs/disk-io.c |  4 +--
 fs/btrfs/extent-tree.c | 12 +
 fs/btrfs/extent_io.c   | 71 +-
 fs/btrfs/extent_io.h   |  3 ++-
 fs/btrfs/transaction.c | 22 +---
 fs/btrfs/transaction.h |  5 ++--
 fs/btrfs/tree-log.c| 13 +
 7 files changed, 110 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 23393ec..8b54acf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -610,7 +610,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
goto err;
 
eb-read_mirror = mirror;
-   if (test_bit(EXTENT_BUFFER_IOERR, eb-bflags)) {
+   if (test_bit(EXTENT_BUFFER_READ_ERR, eb-bflags)) {
ret = -EIO;
goto err;
}
@@ -683,7 +683,7 @@ static int btree_io_failed_hook(struct page *page, int 
failed_mirror)
struct btrfs_root *root = BTRFS_I(page-mapping-host)-root;
 
eb = (struct extent_buffer *)page-private;
-   set_bit(EXTENT_BUFFER_IOERR, eb-bflags);
+   set_bit(EXTENT_BUFFER_READ_ERR, eb-bflags);
eb-read_mirror = failed_mirror;
atomic_dec(eb-io_pages);
if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, eb-bflags))
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ef0845d..bdacd33 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6221,6 +6221,18 @@ out:
 * anymore.
 */
clear_bit(EXTENT_BUFFER_CORRUPT, buf-bflags);
+   /*
+* The unwritten node/leaf (due to an IO error) isn't pointed to by any
+* other node in a tree, so it's safe to forget about the write error
+* and avoid a transaction abort.
+*/
+   if (test_and_clear_bit(EXTENT_BUFFER_WRITE_ERR, buf-bflags)) {
+   if (root-root_key.objectid == BTRFS_TREE_LOG_OBJECTID)
+   atomic_dec(trans-transaction-log_eb_write_errors);
+   else
+   atomic_dec(trans-transaction-eb_write_errors);
+   }
+
btrfs_put_block_group(cache);
 

[PATCH v2] btrfs-progs: Add human readable incompat flags output for btrfs-show-super

2014-09-23 Thread Qu Wenruo
Add human readable incompat flags output for btrfs-show-super,
now no longer needs to calculate the hex flags by hand.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
changelog:
v2: Add the mising 0x before hex unknown incompat flags
---
 btrfs-show-super.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/btrfs-show-super.c b/btrfs-show-super.c
index 38c5d26..456dbd8 100644
--- a/btrfs-show-super.c
+++ b/btrfs-show-super.c
@@ -285,6 +285,58 @@ static void print_backup_roots(struct btrfs_super_block 
*sb)
}
 }
 
+struct readable_flag_entry {
+   u64 bit;
+   char *output;
+};
+
+#define DEF_INCOMPAT_FLAG_ENTRY(bit_name)  \
+   {BTRFS_FEATURE_INCOMPAT_##bit_name, #bit_name}
+
+struct readable_flag_entry incompat_flags_array[] = {
+   DEF_INCOMPAT_FLAG_ENTRY(MIXED_BACKREF),
+   DEF_INCOMPAT_FLAG_ENTRY(DEFAULT_SUBVOL),
+   DEF_INCOMPAT_FLAG_ENTRY(MIXED_GROUPS),
+   DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZO),
+   DEF_INCOMPAT_FLAG_ENTRY(COMPRESS_LZOv2),
+   DEF_INCOMPAT_FLAG_ENTRY(BIG_METADATA),
+   DEF_INCOMPAT_FLAG_ENTRY(EXTENDED_IREF),
+   DEF_INCOMPAT_FLAG_ENTRY(RAID56),
+   DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA),
+   DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES)
+};
+static const int incompat_flags_num = sizeof(incompat_flags_array) /
+ sizeof(struct readable_flag_entry);
+
+static void print_readable_incompat_flag(u64 flag)
+{
+   int i;
+   int first = 1;
+   struct readable_flag_entry *entry;
+
+   if (!flag)
+   return;
+   printf(\t\t\t( );
+   for (i = 0; i  incompat_flags_num; i++) {
+   entry = incompat_flags_array + i;
+   if (flag  entry-bit) {
+   if (first)
+   printf(%s , entry-output);
+   else
+   printf(|\n\t\t\t  %s , entry-output);
+   }
+   first = 0;
+   }
+   flag = ~BTRFS_FEATURE_INCOMPAT_SUPP;
+   if (flag) {
+   if (first)
+   printf(unknown flag: 0x%llx , flag);
+   else
+   printf(|\n\t\t\t  unknown flag: 0x%llx , flag);
+   }
+   printf()\n);
+}
+
 static void dump_superblock(struct btrfs_super_block *sb, int full)
 {
int i;
@@ -364,6 +416,7 @@ static void dump_superblock(struct btrfs_super_block *sb, 
int full)
   (unsigned long long)btrfs_super_compat_ro_flags(sb));
printf(incompat_flags\t\t0x%llx\n,
   (unsigned long long)btrfs_super_incompat_flags(sb));
+   print_readable_incompat_flag(btrfs_super_incompat_flags(sb));
printf(csum_type\t\t%llu\n,
   (unsigned long long)btrfs_super_csum_type(sb));
printf(csum_size\t\t%llu\n,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html