date:20140724

From:
http://man7.org/linux/man-pages/man7/man-pages.7.html
...
AUTHORS lists authors of the documentation or program.Use of
an AUTHORS section is strongly discouraged. Generally,
it is better not to clutter every page with a list of
(over time potentially numerous) authors; if you write
or significantly amend a page, add a copyright notice
as a comment in the source file.  If you are the author
of a device driver and want to include an address for
reporting bugs, place this under the BUGS section.
...

Suggested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 Documentation/btrfs-convert.txt | 12 
 Documentation/btrfs-debug-tree.txt  | 12 
 Documentation/btrfs-find-root.txt   | 12 
 Documentation/btrfs-image.txt   | 12 
 Documentation/btrfs-map-logical.txt | 12 
 Documentation/btrfs-show-super.txt  | 12 
 Documentation/btrfs-zero-log.txt| 12 
 Documentation/btrfstune.txt | 12 
 8 files changed, 96 deletions(-)

diff --git a/Documentation/btrfs-convert.txt b/Documentation/btrfs-convert.txt
index 1eff0bf..11d6044 100644
--- a/Documentation/btrfs-convert.txt
+++ b/Documentation/btrfs-convert.txt
@@ -31,18 +31,6 @@ EXIT STATUS
 *btrfs-convert* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-debug-tree.txt 
b/Documentation/btrfs-debug-tree.txt
index bfc8aa4..23fc115 100644
--- a/Documentation/btrfs-debug-tree.txt
+++ b/Documentation/btrfs-debug-tree.txt
@@ -33,18 +33,6 @@ EXIT STATUS
 *btrfs-debug-tree* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-find-root.txt 
b/Documentation/btrfs-find-root.txt
index a360f8f..c934b4c 100644
--- a/Documentation/btrfs-find-root.txt
+++ b/Documentation/btrfs-find-root.txt
@@ -28,18 +28,6 @@ EXIT STATUS
 *btrfs-find-root* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free to change and redistribute it. There is NO 
WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-image.txt b/Documentation/btrfs-image.txt
index 155194a..b7751f9 100644
--- a/Documentation/btrfs-image.txt
+++ b/Documentation/btrfs-image.txt
@@ -56,18 +56,6 @@ EXIT STATUS
 *btrfs-image* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-map-logical.txt 
b/Documentation/btrfs-map-logical.txt
index a8710bf..a3d110c 100644
--- a/Documentation/btrfs-map-logical.txt
+++ b/Documentation/btrfs-map-logical.txt
@@ -32,18 +32,6 @@ EXIT STATUS
 *btrfs-map-logical* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html.
-
-This is free software: you are free  to  change  and  redistribute  it. There 
is NO WARRANTY, to the extent permitted by law.
-
 SEE ALSO
 
 `mkfs.btrfs`(8)
diff --git a/Documentation/btrfs-show-super.txt 
b/Documentation/btrfs-show-super.txt
index 6fee0f1..1646be3 100644
--- a/Documentation/btrfs-show-super.txt
+++ b/Documentation/btrfs-show-super.txt
@@ -45,18 +45,6 @@ EXIT STATUS
 *btrfs-show-super* will return 0 if no error happened.
 If any problems happened, 1 will be returned.
 
-AUTHOR
---
-Written by Shilong Wang and Wenruo Qu.
-
-COPYRIGHT
--
-Copyright (C) 2013 FUJITSU LIMITED.
-
-License GPLv2: GNU GPL version 2

Re: feature request: consider rw subvols ro for send when volume is mounted ro

On Wed, Jul 23, 2014 at 01:47:36PM -0700, Zach Brown wrote:
 On Wed, Jul 23, 2014 at 02:10:29PM -0600, Chris Murphy wrote:
  The use case is when it's possible to mount a Btrfs volume ro, but not rw. 
  Example, a situation where
  
  # mount -o degraded /dev/sdb /mnt
  [   71.064352] BTRFS info (device sdb): allowing degraded mounts
  [   71.064812] BTRFS info (device sdb): enabling auto recovery
  [   71.065210] BTRFS info (device sdb): disk space caching is enabled
  [   71.072068] BTRFS warning (device sdb): devid 2 missing
  [   71.097320] BTRFS: too many missing devices, writeable mount is not 
  allowed
  [   71.116616] BTRFS: open_ctree failed
  
  Yet this works:
  # mount -o degraded,ro /dev/sdb /mnt
  
  It would be great if it were possible to send/receive subvolumes to a
  different btrfs volume. Currently it's not possible because those
  subvols aren't ro, and because the mount is ro I can't make ro
  snapshots first.
 
 I wonder if that's as easy as the following totally untested hack.  I
 have no idea if a read-only mount would still allow background
 modification that might violate the send code's assumptions.

RO mount tries hard not to do any writes (eg. the from the background
threads), however a remount to RW during send would succeed and any
writes to the sent subvolume may (and most probably will) cause lots of
fun.

This could use similar protection as the subvolumes, the usecase 'allow
to send any subvolume on a RO mount' seems valid to me. The failure of
remount,rw is not silent and the user is able to decide what to do next
(stop send, or postpone remount). Remount may fail for other reasons so
I think we're not adding some unexpected surprises.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] Btrfs: fix wrong skipping compression for an inode

On Thu, Jul 17, 2014 at 11:44:09AM +0800, Wang Shilong wrote:
 If a file's compression ratios is bad, we will set NOCOMPRESS
 flag for it, and it will skip compression for that inode next time.
 
 However, if we remount fs to COMPRESS_FORCE, it still should try
 if we could compress pages for that inode, this patch fix wrong
 check for this problem.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz

The documented compression precedence
https://btrfs.wiki.kernel.org/index.php/Compression#What.27s_the_precedence_of_all_the_options_affecting_compression.3F

matches the way you've fixed it.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] Btrfs: fall into nocompression codes quickly if possible

On Thu, Jul 17, 2014 at 11:44:10AM +0800, Wang Shilong wrote:
 If flag NOCOMPRESS is set which means bad compression ratio,
 we could avoid call cow_file_range_async() for this case earlier.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] Btrfs: fix off-by-one in cow_file_range_inline()

On Thu, Jul 17, 2014 at 11:44:11AM +0800, Wang Shilong wrote:
 Btrfs could still inline file data if its size is same as
 page size, so don't skip max value here.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] Btrfs: fix wrong max inline data size limit

On Thu, Jul 17, 2014 at 11:44:12AM +0800, Wang Shilong wrote:
 inline data is stored from offset of @disk_bytenr in
 struct btrfs_file_extent_item. So substracting total
 size of struct btrfs_file_extent_item is wrong, fix it.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz

  #define BTRFS_MAX_INLINE_DATA_SIZE(r) (BTRFS_LEAF_DATA_SIZE(r) - \
   sizeof(struct btrfs_item) - \
 - sizeof(struct btrfs_file_extent_item))
 + offsetof(struct btrfs_file_extent_item, 
 disk_bytenr))

This increases the limit of inline data by 24 bytes but fortunatelly
does not break existing filesystems because the
BTRFS_MAX_INLINE_DATA_SIZE is used at the time the inlining is decided.
IOW it is a bit pessimistic, the rest of the code uses the offsetof
value.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6] Btrfs: fix wrong write range for filemap_fdatawrite_range()

On Thu, Jul 17, 2014 at 11:44:13AM +0800, Wang Shilong wrote:
 filemap_fdatawrite_range() expect the third arg to be @end
 not @len, fix it.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz

Good catch.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] Btrfs: fix wrong extent mapping for DirectIO

On Thu, Jul 17, 2014 at 11:44:14AM +0800, Wang Shilong wrote:
 btrfs_next_leaf() will use current leaf's last key to search
 and then return a bigger one. So it may still return a file extent
 item that is smaller than expected value and we will
 get an overflow here for @em-len.
 
 This is easy to reproduce for Btrfs Direct writting, it did not
 cause any problem, because writting will re-insert right mapping later.
 
 However, by hacking code to make DIO support compression, wrong extent
 mapping is kept and it encounter merging failure(EEXIST) quickly.

So this cannot happen normally (because compression and DIO do not work
together)?

 Fix this problem by looping to find next file extent item that is bigger
 than @start or we could not find anything more.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: btrfs_put_tree_mod_seq: Don't delete an entry whose seq number is min_seq.

2014-07-24 Thread Chandan Rajendra

The current code allows a tree mod log entry whose seq number is equal to
min_seq to be deleted. Fix this.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index aeab453..49a0df6 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -428,7 +428,7 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
for (node = rb_first(tm_root); node; node = next) {
next = rb_next(node);
tm = container_of(node, struct tree_mod_elem, node);
-   if (tm-seq  min_seq)
+   if (tm-seq = min_seq)
continue;
rb_erase(node, tm_root);
kfree(tm);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Return right extent when fiemap gives unaligned offset and len.

On Fri, Jul 18, 2014 at 09:55:43AM +0800, Qu Wenruo wrote:
 When page aligned start and len passed to extent_fiemap(), the result is
 good, but when start and len is not aligned, e.g. start = 1 and len =
 4095 is passed to extent_fiemap(), it returns no extent.
 
 The problem is that start and len is all rounded down which causes the
 problem.

ALIGN rounds up, not down. So the wrong rounding will use incorrect start
(4096) and finds no extents if there's eg. only one [0,4095].

 This patch will round down start and round up (start + len) to
 return right extent.
 
 Reported-by: Chandan Rajendra chan...@linux.vnet.ibm.com
 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs: Call mount_subtree() even 'subvolid=' mount option is given.

On Wed, Jul 16, 2014 at 12:07:10PM +0800, Qu Wenruo wrote:
 btrfs uses differnet routine to handle 'subvolid=' and 'subvol=' mount
 option.
 Given 'subvol=' mount option, btrfs will mount btrfs first and then call
 mount_subtree() to mount a subtree of btrfs, making vfs handle the path
 searching.
 This is good since vfs layer know extactly that a subtree mount is done
 and findmnt(8) knows which subtree is mounted.
 
 However when using 'subvolid=' mount option, btrfs will do all the
 internal subvolume objectid searching and checking, making VFS unaware
 about which subtree is mounted, as result, findmnt(8) can't showing any
 useful subtree mount info for end users.
 
 This patch will use the root backref to reverse search the subvolume
 path for a given subvolid, making findmnt(8) works again.
 
 Reported-by: Stefan G.Weichinger li...@xunil.at
 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com

Ack for unifying the way subvol= and subvolid= are handled, but I don't
like some aspects of the implementation.

The kmalloc/krealloc makes it really complicated and is not imho
necessary. The mount options length is limited to PAGE_SIZE in the vfs
code. Do the same here, allocate a page, filter the options, do the
necessary processing and just check for overflows.

You can drop u64_to_strlen.

 +#define CLEAR_SUBVOL 1
 +#define CLEAR_SUBVOLID   2

Though they're internal and local to the file, please add BTRFS_ prefix
at least.

  /*
 - * This will strip out the subvol=%s argument for an argument string and add
 - * subvolid=0 to make sure we get the actual tree root for path walking to 
 the
 - * subvol we want.
 + * This will strip out the subvol=%s or subvolid=%s argument for an argumen
 + * string and add subvolid=0 to make sure we get the actual tree root for 
 path
 + * walking to the subvol we want.
   */
 -static char *setup_root_args(char *args)
 +static char *setup_root_args(char *args, int flags, u64 subvol_objectid)
  {
 - unsigned len = strlen(args) + 2 + 1;
 - char *src, *dst, *buf;
 + unsigned len;
 + char *src = NULL, *dst, *buf, *comma;

Please use the recommended style and put each on a separate line. I'm
not sure if you'll need all of them for the implementation witouth the
kmallocs, the comment applies generally.

 + char *subvol_string = subvolid=;
 + int option_len = 0;
 +
 + if (!args) {
 + /* Case 1, not args, all default mounting
 +  * just return 'subvolid=FS_ROOT' */

Not the preferred style of comments.

 + len = strlen(subvol_string) +
 +   u64_to_strlen(subvol_objectid) + 1;
 + dst = kmalloc(len, GFP_NOFS);
 + if (!dst)
 + return NULL;
 + sprintf(dst, %s%llu, subvol_string, subvol_objectid);
 + return dst;
 + }
  
 - /*
 -  * We need the same args as before, but with this substitution:
 -  * s!subvol=[^,]+!subvolid=0!
 -  *
 -  * Since the replacement string is up to 2 bytes longer than the
 -  * original, allocate strlen(args) + 2 + 1 bytes.
 -  */
 + switch (flags) {
 + case CLEAR_SUBVOL:
 + src = strstr(args, subvol=);
 + break;
 + case CLEAR_SUBVOLID:
 + src = strstr(args, subvolid=);
 + break;
 + }
  
 - src = strstr(args, subvol=);
 - /* This shouldn't happen, but just in case.. */
 - if (!src)
 - return NULL;
 + if (!src) {
 + /* Case 2, some args, default subvolume mounting
 +  * just append ',subvolid=FS_ROOT' */
 +
 + /* 1 for ending '\0', 1 for leading ',' */
 + len = strlen(args) + strlen(subvol_string) +
 +   u64_to_strlen(subvol_objectid) + 2;
 + dst = kmalloc(len, GFP_NOFS);
 + if (!dst)
 + return NULL;
 + strcpy(dst, args);
 + sprintf(dst + strlen(args), ,%s%llu, subvol_string,
 + subvol_objectid);
 + return dst;
 + }
 +
 + /* Case 3, subvolid=/subvol=  mount
 +  * repalce the 'subvolid/subvol' options to 'subvolid=FS_ROOT' */
 + comma = strchr(src, ',');
 + if (comma)
 + option_len = comma - src;
 + else
 + option_len = strlen(src);
 + len = strlen(args) - option_len  + strlen(subvol_string) +
 +   u64_to_strlen(subvol_objectid) + 1;
  
   buf = dst = kmalloc(len, GFP_NOFS);
   if (!buf)
 @@ -1154,28 +1208,126 @@ static char *setup_root_args(char *args)
   dst += strlen(args);
   }
  
 - strcpy(dst, subvolid=0);
 - dst += strlen(subvolid=0);
 + len = sprintf(dst, %s%llu, subvol_string, subvol_objectid);
 + dst += len;
  
   /*
* If there is a , after the original subvol=... string,
* copy that suffix into our buffer.  Otherwise, we're done.
*/
 - src = strchr(src, ',');
 - if (src)
 -

Re: [PATCH] btrfs: Add show_path function for btrfs_super_ops.

On Mon, Jul 21, 2014 at 05:02:29PM +0800, Qu Wenruo wrote:
 show_path() function in struct super_operations is used to output
 subtree mount info for mountinfo.
 Without the implement of show_path() function, user can not found where
 each subvolume is mounted if using 'subvolid=' mount option.
 (When mounted with 'subvol=' mount option, vfs is aware of subtree mount
 and can to the path resolve by vfs itself)

Your previous patches unify both to call mount_subtree, then the default
vfs implementation of show_path will do the right thing, ie
seq_dentry(...), and the path will be resolved for free.

Means this patch is not needed, so I'll skip commenting it.

 With this patch, end users will be able to use findmnt(8) or other
 programs reading mountinfo to find which btrfs subvolume is mounted.
 
 Though we use fs_info-subvol_sem to protect show_path() from subvolume
 destroying/creating, if user renames/moves the parent non-subvolume
 dir of a subvolume, it is still possible that concurrency may happen and
 cause btrfs_search_slot() fails to find the desired key.
 In that case, we just return -EBUSY and info user to try again since
 extra locking like locking the whole subvolume tree is too expensive for
 such usage.

And the subvolume renames will be handled as well.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/10] Btrfs: Fix the problem that the replace destroys the seed filesystem

On Thu, Jul 24, 2014 at 11:37:06AM +0800, Miao Xie wrote:
 The seed filesystem was destroyed by the device replace, the reproduce
 method is:
  # mkfs.btrfs -f dev0
  # btrfstune -S 1 dev0
  # mount dev0 mnt
  # btrfs device add dev1 mnt
  # umount mnt
  # mount dev1 mnt
  # btrfs replace start -f dev0 dev2 mnt
  # umount mnt
  # mount dev0 mnt
 
 It is because we erase the super block on the seed device. It is wrong,
 we should not change anything on the seed device.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/10] Btrfs: fix wrong fsid check of scrub

On Thu, Jul 24, 2014 at 11:37:08AM +0800, Miao Xie wrote:
 All the metadata in the seed devices has the same fsid as the fsid
 of the seed filesystem which is on the seed device, so we should check
 them by the current filesystem. Fix it.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz

 ---
  fs/btrfs/scrub.c | 18 +-
  1 file changed, 13 insertions(+), 5 deletions(-)
 
 diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
 index 23d3f6e..9a81874e 100644
 --- a/fs/btrfs/scrub.c
 +++ b/fs/btrfs/scrub.c
 @@ -1361,6 +1361,16 @@ static void scrub_recheck_block(struct btrfs_fs_info 
 *fs_info,
   return;
  }
  
 +static inline int scrub_check_fsid(u8 fsid[],

Please use 'const u8 *fsid' type.

 +struct scrub_page *spage)
 +{
 + struct btrfs_fs_devices *fs_devices = spage-dev-fs_devices;
 + int ret;
 +
 + ret = memcmp(fsid, fs_devices-fsid, BTRFS_UUID_SIZE);

ret is not necessary

 + return !ret;
 +}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/10] Btrfs: fix wrong generation check of super block on a seed device

On Thu, Jul 24, 2014 at 11:37:09AM +0800, Miao Xie wrote:
 The super block generation of the seed devices is not the same as the
 filesystem which sprouted from them because we don't update the super
 block on the seed devices when we change that new filesystem. So we
 should not use the generation of that new filesystem to check the super
 block generation on the seed devices, Fix it.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com

Good catch.

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/10] Btrfs: don't write any data into a readonly device when scrub

On Thu, Jul 24, 2014 at 11:37:07AM +0800, Miao Xie wrote:
 We should not write data into a readonly device especially seed device when
 doing scrub, skip those devices.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz

One minor comment below.

 @@ -2904,6 +2904,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 
 devid, u64 start,
   struct scrub_ctx *sctx;
   int ret;
   struct btrfs_device *dev;
 + struct rcu_string *name;
  
 + if (!is_dev_replace  !readonly  !dev-writeable) {

You can define 'name' within the block.

 + mutex_unlock(fs_info-fs_devices-device_list_mutex);
 + rcu_read_lock();
 + name = rcu_dereference(dev-name);
 + btrfs_err(fs_info, scrub: device %s is not writable,
 +   name-str);
 + rcu_read_unlock();
 + return -EROFS;
 + }
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/10] Btrfs: Fix the problem that the dirty flag of dev stats is cleared

On Thu, Jul 24, 2014 at 11:37:11AM +0800, Miao Xie wrote:
 The io error might happen during writing out the device stats, and the
 device stats information and dirty flag would be update at that time,
 but the current code didn't consider this case, just clear the dirty
 flag, it would cause that we forgot to write out the new device stats
 information. Fix it.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |  7 +--
  fs/btrfs/volumes.h | 19 +++
  2 files changed, 20 insertions(+), 6 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 19188df..0d37746 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -159,6 +159,7 @@ static struct btrfs_device *__alloc_device(void)
  
   spin_lock_init(dev-reada_lock);
   atomic_set(dev-reada_in_flight, 0);
 + atomic_set(dev-dev_stats_ccnt, 0);
   INIT_RADIX_TREE(dev-reada_zones, GFP_NOFS  ~__GFP_WAIT);
   INIT_RADIX_TREE(dev-reada_extents, GFP_NOFS  ~__GFP_WAIT);
  
 @@ -6398,16 +6399,18 @@ int btrfs_run_dev_stats(struct btrfs_trans_handle 
 *trans,
   struct btrfs_root *dev_root = fs_info-dev_root;
   struct btrfs_fs_devices *fs_devices = fs_info-fs_devices;
   struct btrfs_device *device;
 + int stats_cnt;
   int ret = 0;
  
   mutex_lock(fs_devices-device_list_mutex);
   list_for_each_entry(device, fs_devices-devices, dev_list) {
 - if (!device-dev_stats_valid || !device-dev_stats_dirty)
 + if (!device-dev_stats_valid || !btrfs_dev_stats_dirty(device))

The helper btrfs_dev_stats_dirty is used only once and IMHO not
necessary.
   continue;
  
 + stats_cnt = atomic_read(device-dev_stats_ccnt);

Here it is opencoded anyway.

   ret = update_dev_stat_item(trans, dev_root, device);
   if (!ret)
 - device-dev_stats_dirty = 0;
 + atomic_sub(stats_cnt, device-dev_stats_ccnt);
   }
   mutex_unlock(fs_devices-device_list_mutex);
  
 diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
 index 6fcc8ea..0defd23 100644
 --- a/fs/btrfs/volumes.h
 +++ b/fs/btrfs/volumes.h
 @@ -110,7 +110,9 @@ struct btrfs_device {
   /* disk I/O failure stats. For detailed description refer to
* enum btrfs_dev_stat_values in ioctl.h */
   int dev_stats_valid;
 - int dev_stats_dirty; /* counters need to be written to disk */
 +
 + /* Counter to record the change of device stats */
 + atomic_t dev_stats_ccnt;

dev_stats_dirty is more descriptive, please keep it. The counter
semantics can be documented here.

   atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
  };
  
 @@ -359,11 +361,18 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root 
 *root,
  int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
   struct btrfs_root *extent_root,
   u64 chunk_offset, u64 chunk_size);
 +
 +static inline int btrfs_dev_stats_dirty(struct btrfs_device *dev)
 +{
 + return atomic_read(dev-dev_stats_ccnt);

IMHO too trivial, not necessary.

 +}
 +
  static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
 int index)
  {
   atomic_inc(dev-dev_stat_values + index);
 - dev-dev_stats_dirty = 1;

 + smp_mb__before_atomic();
 + atomic_inc(dev-dev_stats_ccnt);

Please put the two lines into a wrapper, 3 times repeating the same is
worth it.

 @@ -378,7 +387,8 @@ static inline int btrfs_dev_stat_read_and_reset(struct 
 btrfs_device *dev,
   int ret;
  
   ret = atomic_xchg(dev-dev_stat_values + index, 0);
 - dev-dev_stats_dirty = 1;
 + smp_mb__before_atomic();
 + atomic_inc(dev-dev_stats_ccnt);

 @@ -386,7 +396,8 @@ static inline void btrfs_dev_stat_set(struct btrfs_device 
 *dev,
 int index, unsigned long val)
  {
   atomic_set(dev-dev_stat_values + index, val);
 - dev-dev_stats_dirty = 1;
 + smp_mb__before_atomic();
 + atomic_inc(dev-dev_stats_ccnt);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix compressed write corruption on enospc

2014-07-24 Thread Liu Bo

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/inode.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ea7610 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -709,6 +709,18 @@ retry:
unlock_extent(io_tree, async_extent-start,
  async_extent-start +
  async_extent-ram_size - 1);
+
+   /*
+* we need to redirty the pages if we decide to
+* fallback to uncompressed IO, otherwise we
+* will not submit these pages down to lower
+* layers.
+*/
+   extent_range_redirty_for_io(inode,
+   async_extent-start,
+   async_extent-start +
+   async_extent-ram_size - 1);
+
goto retry;
}
goto out_free;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix compressed write corruption on enospc

On 07/24/2014 10:48 AM, Liu Bo wrote:
 When failing to allocate space for the whole compressed extent, we'll
 fallback to uncompressed IO, but we've forgotten to redirty the pages
 which belong to this compressed extent, and these 'clean' pages will
 simply skip 'submit' part and go to endio directly, at last we got data
 corruption as we write nothing.

This fallback code was my #1 suspect for the hangs people have been
seeing since 3.15.  I changed things around to trigger the fallback
randomly and wasn't able to trigger problems, but I was looking for
hangs and not corruptions.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

On 07/23/2014 06:47 PM, Martin Steigerwald wrote:
 Am Dienstag, 15. Juli 2014, 17:08:27 schrieb Martin Steigerwald:
 Am Dienstag, 15. Juli 2014, 09:21:40 schrieb Chris Mason:
 On 07/14/2014 05:58 PM, Martin Steigerwald wrote:
 Am Montag, 14. Juli 2014, 16:12:22 schrieb Chris Mason:
 On 07/14/2014 11:10 AM, Martin Steigerwald wrote:
 Am Montag, 14. Juli 2014, 17:04:22 schrieben Sie:
 Hi!

 While with 3.16-rc3 and rc4 I didn´t have a BTRFS hang in several
 days
 of
 usage, with 3-16-rc5 I had a hang again. Less than a hour since
 booting
 it.

 Since the hang bug I and others had with 3.15 and upto 3.16-rc2
 usually
 didn´t happen that quickly after boot and since backtrace looks a bit
 different from what I have in memory, I post this in a new thread.
 See thread Blocked tasks on 3.15.1 for a discussion of previous
 hang
 issues.

 Probably good to add some basic information on the filesystem:
 Do you have compression enabled?  I wasn't able to nail down the 3.15.1
 hang before vacation attacked me, but I'm hoping to track it down
 today.

 Yes. I have.

 It just hung again while I was playing PlaneShift.

 Back to 3.16-rc4 as rc5 seems to be broke here.

 The btrfs hang you're hitting goes back to 3.15.  So 3.16-rc4 vs rc5
 shouldn't be a factor.  Are you hitting other problems with 3.16?

 So far for this day 3.16-rc4 behaves nicely. With 3.16-rc5 I had a BTRFS
 hang twice yesterday. 3.16-rc4 before also behaved nicely for several days
 or well about a week here.
 
 3.16-rc4 now hung as well…

Liu Bo has a promising patch:

https://patchwork.kernel.org/patch/4618421/

Please give it a shot.  There's a second deadlock reading the free space
cache, I'm still working on that one too.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: add zero-log to rescue subcommand

Copy the functionality of standalone btrfs-zero-log to the main tool.
The standalone utility will be removed later.

Signed-off-by: David Sterba dste...@suse.cz
---
 cmds-rescue.c | 49 -
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/cmds-rescue.c b/cmds-rescue.c
index f20a2068a16b..d76a879d543a 100644
--- a/cmds-rescue.c
+++ b/cmds-rescue.c
@@ -19,6 +19,9 @@
 #include kerncompat.h
 
 #include getopt.h
+#include ctree.h
+#include transaction.h
+#include disk-io.h
 #include commands.h
 #include utils.h
 
@@ -149,11 +152,55 @@ int cmd_super_recover(int argc, char **argv)
return ret;
 }
 
+const char * const cmd_rescue_zero_log_usage[] = {
+   btrfs rescue zero-log device,
+   Clear the tree log. Usable if it's corrupted and prevents mount.,
+   ,
+   NULL
+};
+
+int cmd_rescue_zero_log(int argc, char **argv)
+{
+   struct btrfs_root *root;
+   struct btrfs_trans_handle *trans;
+   char *devname;
+   int ret;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_rescue_zero_log_usage);
+
+   devname = argv[optind];
+   ret = check_mounted(devname);
+   if (ret  0) {
+   fprintf(stderr, Could not check mount status: %s\n, 
strerror(-ret));
+   return 1;
+   } else if (ret) {
+   fprintf(stderr, %s is currently mounted. Aborting.\n, 
devname);
+   return 1;
+   }
+
+   root = open_ctree(devname, 0, OPEN_CTREE_WRITES);
+   if (!root) {
+   fprintf(stderr, Could not open ctree\n);
+   return 1;
+   }
+
+   printf(Clearing log on %s\n, devname);
+   trans = btrfs_start_transaction(root, 1);
+   btrfs_set_super_log_root(root-fs_info-super_copy, 0);
+   btrfs_set_super_log_root_level(root-fs_info-super_copy, 0);
+   btrfs_commit_transaction(trans, root);
+   close_ctree(root);
+
+   return 0;
+}
+
 const struct cmd_group rescue_cmd_group = {
rescue_cmd_group_usage, NULL, {
{ chunk-recover, cmd_chunk_recover, cmd_chunk_recover_usage, 
NULL, 0},
{ super-recover, cmd_super_recover, cmd_super_recover_usage, 
NULL, 0},
-   { 0, 0, 0, 0, 0 }
+   { zero-log, cmd_rescue_zero_log, cmd_rescue_zero_log_usage, 
NULL, 0},
+   NULL_CMD_STRUCT
}
 };
 
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

2014-07-24 Thread Martin Steigerwald

Am Donnerstag, 24. Juli 2014, 10:58:51 schrieb Chris Mason:
 On 07/23/2014 06:47 PM, Martin Steigerwald wrote:
  Am Dienstag, 15. Juli 2014, 17:08:27 schrieb Martin Steigerwald:
  Am Dienstag, 15. Juli 2014, 09:21:40 schrieb Chris Mason:
  On 07/14/2014 05:58 PM, Martin Steigerwald wrote:
  Am Montag, 14. Juli 2014, 16:12:22 schrieb Chris Mason:
  On 07/14/2014 11:10 AM, Martin Steigerwald wrote:
  Am Montag, 14. Juli 2014, 17:04:22 schrieben Sie:
  Hi!
  
  While with 3.16-rc3 and rc4 I didn´t have a BTRFS hang in several
  days
  of
  usage, with 3-16-rc5 I had a hang again. Less than a hour since
  booting
  it.
  
  Since the hang bug I and others had with 3.15 and upto 3.16-rc2
  usually
  didn´t happen that quickly after boot and since backtrace looks a
  bit
  different from what I have in memory, I post this in a new thread.
  See thread Blocked tasks on 3.15.1 for a discussion of previous
  hang
  issues.
  
  Probably good to add some basic information on the filesystem:
  Do you have compression enabled?  I wasn't able to nail down the
  3.15.1
  hang before vacation attacked me, but I'm hoping to track it down
  today.
  
  Yes. I have.
  
  It just hung again while I was playing PlaneShift.
  
  Back to 3.16-rc4 as rc5 seems to be broke here.
  
  The btrfs hang you're hitting goes back to 3.15.  So 3.16-rc4 vs rc5
  shouldn't be a factor.  Are you hitting other problems with 3.16?
  
  So far for this day 3.16-rc4 behaves nicely. With 3.16-rc5 I had a BTRFS
  hang twice yesterday. 3.16-rc4 before also behaved nicely for several
  days
  or well about a week here.
  
  3.16-rc4 now hung as well…
 
 Liu Bo has a promising patch:
 
 https://patchwork.kernel.org/patch/4618421/
 
 Please give it a shot.  There's a second deadlock reading the free space
 cache, I'm still working on that one too.

Okay, I reverted your printk patch and applied this on on git linus git.

Lets see how this works.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-24 Thread Chris Murphy


On Jul 22, 2014, at 11:13 AM, Chris Murphy li...@colorremedies.com wrote:
 
 It's been a while since I did a rebuild on HDDs, 

So I did this yesterday and day before with an SSD and HDD in raid1, and made 
the HDD do the rebuild. 


Baseline for this hard drive:
hdparm -t
35.68 MB/sec

dd if=/dev/zero of=/dev/rdisk2s1 bs=256k
13508091392 bytes transferred in 521.244920 secs (25915056 bytes/sec)

I don't know why hdparm gets such good reads, and dd writes are 75% of that, 
but the 26MB/s write speed is realistic (this is a Firewire 400 external 
device) and what I typically get with long sequential writes. It's probable 
this is interface limited to mode S200, not a drive limitation since on SATA 
Rev 2 or 3 interface I get 100+MB/s transfers.

During the rebuild, iotop reports actual write averaging in the 24MB/s range, 
and the total data to restore divided by total time for the replace command 
comes out to 23MB/s. The source data is a Fedora 21 install with no meaningful 
user data (cache files and such), so mostly a bunch of libraries, programs, and 
documentation. Therefore it's not exclusively small files, yet the iotop rate 
was very stable throughout the 4 minute rebuild.

So I still think 5MB/s for a SATA connected (?) drive is to be unexpected.

 
Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

2014-07-24 Thread Martin Steigerwald

Am Donnerstag, 24. Juli 2014, 10:58:51 schrieb Chris Mason:
 On 07/23/2014 06:47 PM, Martin Steigerwald wrote:
  Am Dienstag, 15. Juli 2014, 17:08:27 schrieb Martin Steigerwald:
  Am Dienstag, 15. Juli 2014, 09:21:40 schrieb Chris Mason:
  On 07/14/2014 05:58 PM, Martin Steigerwald wrote:
  Am Montag, 14. Juli 2014, 16:12:22 schrieb Chris Mason:
  On 07/14/2014 11:10 AM, Martin Steigerwald wrote:
  Am Montag, 14. Juli 2014, 17:04:22 schrieben Sie:
  Hi!
  
  While with 3.16-rc3 and rc4 I didn´t have a BTRFS hang in several
  days
  of
  usage, with 3-16-rc5 I had a hang again. Less than a hour since
  booting
  it.
  
  Since the hang bug I and others had with 3.15 and upto 3.16-rc2
  usually
  didn´t happen that quickly after boot and since backtrace looks a
  bit
  different from what I have in memory, I post this in a new thread.
  See thread Blocked tasks on 3.15.1 for a discussion of previous
  hang
  issues.
  
  Probably good to add some basic information on the filesystem:
  Do you have compression enabled?  I wasn't able to nail down the
  3.15.1
  hang before vacation attacked me, but I'm hoping to track it down
  today.
  
  Yes. I have.
  
  It just hung again while I was playing PlaneShift.
  
  Back to 3.16-rc4 as rc5 seems to be broke here.
  
  The btrfs hang you're hitting goes back to 3.15.  So 3.16-rc4 vs rc5
  shouldn't be a factor.  Are you hitting other problems with 3.16?
  
  So far for this day 3.16-rc4 behaves nicely. With 3.16-rc5 I had a BTRFS
  hang twice yesterday. 3.16-rc4 before also behaved nicely for several
  days
  or well about a week here.
  
  3.16-rc4 now hung as well…
 
 Liu Bo has a promising patch:
 
 https://patchwork.kernel.org/patch/4618421/
 
 Please give it a shot.  There's a second deadlock reading the free space
 cache, I'm still working on that one too.

Now running 3.16-rc6 + current git + this patch.

It may take some time tough cause during compiling the kernel BTRFS hung 
again, which caused loss of KDE Baloo desktop search file index and parts of a 
mail I wrote in KMail.

Since the patch mentioned ENOSPC issues but the filesystem has enough free 
space according to df I shrunk the trees with

btrfs balance start -musage=50 /home
btrfs balance start -musage=50 /home


merkaba:~ btrfs fi sh /home   
Label: 'home'  uuid: […]
Total devices 2 FS bytes used 124.05GiB
devid1 size 160.00GiB used 150.00GiB path /dev/mapper/msata-home
devid2 size 160.00GiB used 150.00GiB path /dev/dm-3


As I bet that the error is more likely to happen when trees occupy all space, 
it may take some time till it happens again.

Well its growing slowly already:

merkaba:~ btrfs fi df /home
Data, RAID1: total=146.97GiB, used=121.84GiB
System, RAID1: total=32.00MiB, used=48.00KiB
Metadata, RAID1: total=4.00GiB, used=2.62GiB
unknown, single: total=512.00MiB, used=0.00
merkaba:~ btrfs fi sh /home
Label: 'home'  uuid: […]
Total devices 2 FS bytes used 124.46GiB
devid1 size 160.00GiB used 151.00GiB path /dev/dm-0
devid2 size 160.00GiB used 151.00GiB path /dev/mapper/sata-home

Btrfs v3.14.1


I wonder why ENOSPC conditions happens with that much space inside trees free. 
Were they just too fragmented?

To me

merkaba:~ LANG=C df -hT /home
Filesystem Type   Size  Used Avail Use% Mounted on
/dev/dm-0  btrfs  320G  249G   69G  79% /home

is a quite healthy free space margin.


Well, lets see how this goes.

I hope it can be fixed soon as it causes loss of recently saved data and 
generally locks up a machine running KDE desktop quite quickly on a BTRFS 
hang.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4 v3] fiemap: add EXTENT_DATA_COMPRESSED flag

On Thu, Jul 17, 2014 at 12:07:57AM -0600, Andreas Dilger wrote:
 any progress on this patch series?

I'm sorry I got distracted at the end of year and did not finish the
series.

 I never saw an updated version of this patch series after the last round of
 reviews, but it would be great to move it forward.  I have filefrag patches
 in my e2fsprogs tree waiting for an updated version of your patch.
 
 I recall the main changes were:
 - add FIEMAP_EXTENT_PHYS_LENGTH flag to indicate if fe_phys_length was valid

fe_phys_length will be always valid, so other the flags are set only if it's
not equal to the logical length.

 - rename fe_length to fe_logi_length and #define fe_length fe_logi_length
 - always fill in fe_phys_length (= fe_logi_length for uncompressed files)
   and set FIEMAP_EXTENT_PHYS_LENGTH whether the extent is compressed or not

This is my understanding and contradicts the first point.

 - add WARN_ONCE() in fiemap_fill_next_extent() as described below

 I don't know if there was any clear statement about whether there should be
 separate FIEMAP_EXTENT_PHYS_LENGTH and FIEMAP_EXTENT_DATA_COMPRESSED flags,
 or if the latter should be implicit?  Probably makes sense to have separate
 flags.  It should be fine to use:

 #define FIEMAP_EXTENT_PHYS_LENGTH 0x0010
 
 since this flag was never used.

I've kept only FIEMAP_EXTENT_DATA_COMPRESSED, I don't see a need for
FIEMAP_EXTENT_PHYS_LENGTH and this would be yet another flag because the
FIEMAP_EXTENT_DATA_ENCODED is also implied.

I'll send V4, we can discuss the PHYS_LENGTH flag then.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

On 07/24/2014 02:49 PM, Martin Steigerwald wrote:
Am Donnerstag, 24. Juli 2014, 10:58:51 schrieb Chris Mason:
On 07/23/2014 06:47 PM, Martin Steigerwald wrote:
Am Dienstag, 15. Juli 2014, 17:08:27 schrieb Martin Steigerwald:
Am Dienstag, 15. Juli 2014, 09:21:40 schrieb Chris Mason:
On 07/14/2014 05:58 PM, Martin Steigerwald wrote:
Am Montag, 14. Juli 2014, 16:12:22 schrieb Chris Mason:
On 07/14/2014 11:10 AM, Martin Steigerwald wrote:
Am Montag, 14. Juli 2014, 17:04:22 schrieben Sie:
Hi!

While with 3.16-rc3 and rc4 I didn´t have a BTRFS hang in several
days
of
usage, with 3-16-rc5 I had a hang again. Less than a hour since
booting
it.

Since the hang bug I and others had with 3.15 and upto 3.16-rc2
usually
didn´t happen that quickly after boot and since backtrace looks a
bit
different from what I have in memory, I post this in a new thread.
See thread Blocked tasks on 3.15.1 for a discussion of previous
hang
issues.

Probably good to add some basic information on the filesystem:
Do you have compression enabled? I wasn't able to nail down the
3.15.1
hang before vacation attacked me, but I'm hoping to track it down
today.

Yes. I have.

It just hung again while I was playing PlaneShift.

Back to 3.16-rc4 as rc5 seems to be broke here.

The btrfs hang you're hitting goes back to 3.15. So 3.16-rc4 vs rc5
shouldn't be a factor. Are you hitting other problems with 3.16?

So far for this day 3.16-rc4 behaves nicely. With 3.16-rc5 I had a BTRFS
hang twice yesterday. 3.16-rc4 before also behaved nicely for several
days
or well about a week here.

3.16-rc4 now hung as well…

Liu Bo has a promising patch:

https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/4618421/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0Am=CJPREifRDOxlzhYeURx75h33LGU7YemJsNeLP%2FvXCv8%3D%0As=8fb0a70afce09530f16ea66a47d2af07966706b21281a7142d86256979013bab

Please give it a shot. There's a second deadlock reading the free space
cache, I'm still working on that one too.

Now running 3.16-rc6 + current git + this patch.

It may take some time tough cause during compiling the kernel BTRFS hung
again, which caused loss of KDE Baloo desktop search file index and parts of
a
mail I wrote in KMail.

Since the patch mentioned ENOSPC issues but the filesystem has enough free
space according to df I shrunk the trees with

Thanks for giving it a try. The ENOSPC mentioned here is looking for a
contiguous extent, so it's easily possible to trigger that enospc
without actually being full.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH, RFC] btrfs: refactor open_ctree()

On 06/25/2014 07:55 PM, Eric Sandeen wrote:
 First off: total RFC, don't merge this; it builds, but
 is totally untested.
 
 open_ctree() is almost 1000 lines long.  I've started trying
 to refactor it, primarily into helper functions, and also
 simplifying (?) things a bit at the beginning by removing the
 ret = func(); if (ret) { err = ret; goto ... } dance where it's not
 needed.
 
 Does this look like a reasonable thing to do?  Have I cut
 things into the right chunks?  Would you rather see it as
 as series of patches, moving one hunk of code at a time?

I do love this patch, either as a series or one big patch.  Whatever
makes it easiest for you to test is fine with me.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] btrfs: Use backup superblocks if and only if the first superblock is valid but corrupted.



On 06/26/2014 11:53 PM, Qu Wenruo wrote:
 Current btrfs will only use the first superblock, making the backup
 superblocks only useful for 'btrfs rescue super' command.
 
 The old problem is that if we use backup superblocks when the first
 superblock is not valid, we will be able to mount a none btrfs
 filesystem, which used to contains btrfs but other fs is made on it.
 
 The old problem can be solved related easily by checking the first
 superblock in a special way:
 1) If the magic number in the first superblock does not match:
This filesystem is not btrfs anymore, just exit.
If end-user consider it's really btrfs, then old 'btrfs rescue super'
method is still available.
 
 2) If the magic number in the first superblock matches but checksum does
not match:
This filesystem is btrfs but first superblock is corrupted, use
backup roots. Just continue searching remaining superblocks.

I do agree that in these cases we can trust that the backup superblock
comes from the same filesystem.

But, for right now I'd prefer the admin get involved in using the backup
supers.  I think silently using the backups is going to lead to surprises.

Thanks!

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4 v3] fiemap: add EXTENT_DATA_COMPRESSED flag

2014-07-24 Thread Andreas Dilger


On Jul 24, 2014, at 1:22 PM, David Sterba dste...@suse.cz wrote:
 On Thu, Jul 17, 2014 at 12:07:57AM -0600, Andreas Dilger wrote:
 any progress on this patch series?
 
 I'm sorry I got distracted at the end of year and did not finish the
 series.
 
 I never saw an updated version of this patch series after the last round of
 reviews, but it would be great to move it forward.  I have filefrag patches
 in my e2fsprogs tree waiting for an updated version of your patch.
 
 I recall the main changes were:
 - add FIEMAP_EXTENT_PHYS_LENGTH flag to indicate if fe_phys_length was valid
 
 fe_phys_length will be always valid, so other the flags are set only if it's
 not equal to the logical length.
 
 - rename fe_length to fe_logi_length and #define fe_length fe_logi_length
 - always fill in fe_phys_length (= fe_logi_length for uncompressed files)
  and set FIEMAP_EXTENT_PHYS_LENGTH whether the extent is compressed or not
 
 This is my understanding and contradicts the first point.

I think Dave Chinner's former point was that having fe_phys_length validity
depend on FIEMAP_EXTENT_DATA_COMPRESSED is a non-intuitive interface.  It is
not true that fe_phys_length would always be valid, since that is not the
case for older kernels that currently always set this field to 0, so they
need some flag to indicate if fe_phys_length is valid.  Alternately,
userspace could do:

if (ext-fe_phys_length == 0)
ext-fe_phys_length = ext-fe_logi_length;

but that pre-supposes that fe_phys_length == 0 is never a valid value when
fe_logi_length is non-zero, and this might introduce errors in some cases.
I could imagine that some compression methods might not allocate any space
at all if it was all zeroes, and just store a bit in the blockpointer or
extent, so having a separate FIEMAP_EXTENT_PHYS_LENGTH is probably safer
in the long run.  That opens up the question of whether a written zero
filled space that gets compressed away is different from a hole, but I'd
prefer to just return whatever the file mapping is than interpret it.

Cheers, Andreas

 - add WARN_ONCE() in fiemap_fill_next_extent() as described below
 
 I don't know if there was any clear statement about whether there should be
 separate FIEMAP_EXTENT_PHYS_LENGTH and FIEMAP_EXTENT_DATA_COMPRESSED flags,
 or if the latter should be implicit?  Probably makes sense to have separate
 flags.  It should be fine to use:
 
 #define FIEMAP_EXTENT_PHYS_LENGTH0x0010
 
 since this flag was never used.
 
 I've kept only FIEMAP_EXTENT_DATA_COMPRESSED, I don't see a need for
 FIEMAP_EXTENT_PHYS_LENGTH and this would be yet another flag because the
 FIEMAP_EXTENT_DATA_ENCODED is also implied.
 
 I'll send V4, we can discuss the PHYS_LENGTH flag then.


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH, RFC] btrfs: refactor open_ctree()

2014-07-24 Thread Eric Sandeen

On 7/24/14, 4:25 PM, Chris Mason wrote:
 On 06/25/2014 07:55 PM, Eric Sandeen wrote:
 First off: total RFC, don't merge this; it builds, but
 is totally untested.

 open_ctree() is almost 1000 lines long.  I've started trying
 to refactor it, primarily into helper functions, and also
 simplifying (?) things a bit at the beginning by removing the
 ret = func(); if (ret) { err = ret; goto ... } dance where it's not
 needed.

 Does this look like a reasonable thing to do?  Have I cut
 things into the right chunks?  Would you rather see it as
 as series of patches, moving one hunk of code at a time?
 
 I do love this patch, either as a series or one big patch.  Whatever
 makes it easiest for you to test is fine with me.

Oh right!  I remember this thing!  Let me try to get back to it... ;)

Thanks,
-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Add show_path function for btrfs_super_ops.

Thanks for the comment.

 Original Message 
Subject: Re: [PATCH] btrfs: Add show_path function for btrfs_super_ops.
From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年07月24日 21:09

On Mon, Jul 21, 2014 at 05:02:29PM +0800, Qu Wenruo wrote:

show_path() function in struct super_operations is used to output
subtree mount info for mountinfo.
Without the implement of show_path() function, user can not found where
each subvolume is mounted if using 'subvolid=' mount option.
(When mounted with 'subvol=' mount option, vfs is aware of subtree mount
and can to the path resolve by vfs itself)

Your previous patches unify both to call mount_subtree, then the default
vfs implementation of show_path will do the right thing, ie
seq_dentry(...), and the path will be resolved for free.

Means this patch is not needed, so I'll skip commenting it.

I'm sorry that I forgot to mention this patch is going to replace the 
previous patch(use mount_subtree method).

Since vfs provide the show_path() function to do the fs specific subtree 
showing things,

I would like to use it other than previous mount_subtree() trick.

Also. as mentioned by Chandan Rajendra, previous subtree patch can't 
handle subvolume behind normal directory.

This show_path() patch is somewhat v2 version of previous patch.

With this patch, end users will be able to use findmnt(8) or other
programs reading mountinfo to find which btrfs subvolume is mounted.

Though we use fs_info-subvol_sem to protect show_path() from subvolume
destroying/creating, if user renames/moves the parent non-subvolume
dir of a subvolume, it is still possible that concurrency may happen and
cause btrfs_search_slot() fails to find the desired key.
In that case, we just return -EBUSY and info user to try again since
extra locking like locking the whole subvolume tree is too expensive for
such usage.

And the subvolume renames will be handled as well.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-07-24 Thread Liu Bo

On Thu, Jul 24, 2014 at 10:55:47AM -0400, Chris Mason wrote:
 On 07/24/2014 10:48 AM, Liu Bo wrote:
  When failing to allocate space for the whole compressed extent, we'll
  fallback to uncompressed IO, but we've forgotten to redirty the pages
  which belong to this compressed extent, and these 'clean' pages will
  simply skip 'submit' part and go to endio directly, at last we got data
  corruption as we write nothing.
 
 This fallback code was my #1 suspect for the hangs people have been
 seeing since 3.15.  I changed things around to trigger the fallback
 randomly and wasn't able to trigger problems, but I was looking for
 hangs and not corruptions.
 

So now you're able to trigger the hang without changing the fallback code?

I tried raid1 and raid0 with fsmark and rsync in different ways but still fails
to reproduce the hang :-(

The most weird thing is who the hell holds the free space inode's page, is it
possible to share pages with other inode? (My answer is NO, but I'm not sure
now...)

thanks,
-liubo

 -chris
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs: Call mount_subtree() even 'subvolid=' mount option is given.


Thanks for your comment.

I'm very sorry that this patch takes your time to review, but later 
patch(show_path one) should replace this patch.

As mentioned in that thread, this patch is not completly working.
And in fact, show_path() patch is the v2 version of this patch, but due 
to change of patch name, I didn't add the v2

tag.

Thanks,
Qu

 Original Message 
Subject: Re: [PATCH 1/2] btrfs: Call mount_subtree() even 'subvolid=' 
mount option is given.

From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年07月24日 20:48

On Wed, Jul 16, 2014 at 12:07:10PM +0800, Qu Wenruo wrote:

btrfs uses differnet routine to handle 'subvolid=' and 'subvol=' mount
option.
Given 'subvol=' mount option, btrfs will mount btrfs first and then call
mount_subtree() to mount a subtree of btrfs, making vfs handle the path
searching.
This is good since vfs layer know extactly that a subtree mount is done
and findmnt(8) knows which subtree is mounted.

However when using 'subvolid=' mount option, btrfs will do all the
internal subvolume objectid searching and checking, making VFS unaware
about which subtree is mounted, as result, findmnt(8) can't showing any
useful subtree mount info for end users.

This patch will use the root backref to reverse search the subvolume
path for a given subvolid, making findmnt(8) works again.

Reported-by: Stefan G.Weichinger li...@xunil.at
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com

Ack for unifying the way subvol= and subvolid= are handled, but I don't
like some aspects of the implementation.

The kmalloc/krealloc makes it really complicated and is not imho
necessary. The mount options length is limited to PAGE_SIZE in the vfs
code. Do the same here, allocate a page, filter the options, do the
necessary processing and just check for overflows.

You can drop u64_to_strlen.


+#define CLEAR_SUBVOL   1
+#define CLEAR_SUBVOLID 2

Though they're internal and local to the file, please add BTRFS_ prefix
at least.


  /*
- * This will strip out the subvol=%s argument for an argument string and add
- * subvolid=0 to make sure we get the actual tree root for path walking to the
- * subvol we want.
+ * This will strip out the subvol=%s or subvolid=%s argument for an argumen
+ * string and add subvolid=0 to make sure we get the actual tree root for path
+ * walking to the subvol we want.
   */
-static char *setup_root_args(char *args)
+static char *setup_root_args(char *args, int flags, u64 subvol_objectid)
  {
-   unsigned len = strlen(args) + 2 + 1;
-   char *src, *dst, *buf;
+   unsigned len;
+   char *src = NULL, *dst, *buf, *comma;

Please use the recommended style and put each on a separate line. I'm
not sure if you'll need all of them for the implementation witouth the
kmallocs, the comment applies generally.


+   char *subvol_string = subvolid=;
+   int option_len = 0;
+
+   if (!args) {
+   /* Case 1, not args, all default mounting
+* just return 'subvolid=FS_ROOT' */

Not the preferred style of comments.


+   len = strlen(subvol_string) +
+ u64_to_strlen(subvol_objectid) + 1;
+   dst = kmalloc(len, GFP_NOFS);
+   if (!dst)
+   return NULL;
+   sprintf(dst, %s%llu, subvol_string, subvol_objectid);
+   return dst;
+   }
  
-	/*

-* We need the same args as before, but with this substitution:
-* s!subvol=[^,]+!subvolid=0!
-*
-* Since the replacement string is up to 2 bytes longer than the
-* original, allocate strlen(args) + 2 + 1 bytes.
-*/
+   switch (flags) {
+   case CLEAR_SUBVOL:
+   src = strstr(args, subvol=);
+   break;
+   case CLEAR_SUBVOLID:
+   src = strstr(args, subvolid=);
+   break;
+   }
  
-	src = strstr(args, subvol=);

-   /* This shouldn't happen, but just in case.. */
-   if (!src)
-   return NULL;
+   if (!src) {
+   /* Case 2, some args, default subvolume mounting
+* just append ',subvolid=FS_ROOT' */
+
+   /* 1 for ending '\0', 1 for leading ',' */
+   len = strlen(args) + strlen(subvol_string) +
+ u64_to_strlen(subvol_objectid) + 2;
+   dst = kmalloc(len, GFP_NOFS);
+   if (!dst)
+   return NULL;
+   strcpy(dst, args);
+   sprintf(dst + strlen(args), ,%s%llu, subvol_string,
+   subvol_objectid);
+   return dst;
+   }
+
+   /* Case 3, subvolid=/subvol=  mount
+* repalce the 'subvolid/subvol' options to 'subvolid=FS_ROOT' */
+   comma = strchr(src, ',');
+   if (comma)
+   option_len = comma - src;
+   else
+   option_len = strlen(src);
+   len = strlen(args) - option_len  +

Re: btrfs_qgroup_create unused parameter


Hi Kevin,

On 07/25/2014 07:23 AM, Kevin Brandstatter wrote:

I submitted a patch for this a week or two ago
(https://patchwork.kernel.org/patch/4486121/), but latest for-linus
doesn't have it merged, is it just being put of as minor, or is there a
problem with it?

I believe your patch will be picked up by Chris and sent to Linus
when next merge window is open.

Since Chris is sometimes busy, patch merging is always
delayed for some time.

Thanks,
Wang


-Kevin

On 07/04/2014 09:09 PM, Wang Shilong wrote:

Hi

I think you are right,  @name here is unneeded..
You can give a patch for that.^_^

Wang

|The code is pasted below for convenience of reference, but in the function to
create a qgruop, it taks a 4th parameter (char * name). I assume this is the 
name
of the path to limit, however, i don't see where its used anywhere in the 
function.

-Kevin Brandstatter

int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 qgroupid, *char** 
*name**)*
{
struct btrfs_root *quota_root;
struct btrfs_qgroup *qgroup;
int ret = 0;

mutex_lock(fs_info-qgroup_ioctl_lock);
quota_root = fs_info-quota_root;
if (!quota_root) {
ret = -EINVAL;
goto out;
}
qgroup = find_qgroup_rb(fs_info, qgroupid);
if (qgroup) {
ret = -EEXIST;
goto out;
}

ret = add_qgroup_item(trans, quota_root, qgroupid);
if (ret)
goto out;

spin_lock(fs_info-qgroup_lock);
qgroup = add_qgroup_rb(fs_info, qgroupid);
spin_unlock(fs_info-qgroup_lock);

if (IS_ERR(qgroup))
ret = PTR_ERR(qgroup);
out:
mutex_unlock(fs_info-qgroup_ioctl_lock);
return ret;
}|

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Return right extent when fiemap gives unaligned offset and len.

 Original Message 
Subject: Re: [PATCH] btrfs: Return right extent when fiemap gives 
unaligned offset and len.

From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年07月24日 20:17

On Fri, Jul 18, 2014 at 09:55:43AM +0800, Qu Wenruo wrote:

When page aligned start and len passed to extent_fiemap(), the result is
good, but when start and len is not aligned, e.g. start = 1 and len =
4095 is passed to extent_fiemap(), it returns no extent.

The problem is that start and len is all rounded down which causes the
problem.

ALIGN rounds up, not down. So the wrong rounding will use incorrect start
(4096) and finds no extents if there's eg. only one [0,4095].

Sorry for the wrong description in patch.
Should I reword the patch and send a v2 patch?

Thanks,
Qu

This patch will round down start and round up (start + len) to
return right extent.

Reported-by: Chandan Rajendra chan...@linux.vnet.ibm.com
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com

Reviewed-by: David Sterba dste...@suse.cz

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

integration tree updated

Hi everyone,

I've pushed out my current integration branch.  It does have a few of
Miao Xie's patches missing because there were some rejects.  I think
this was just because some things got pulled in out of order, and I'll
get it fixed up.

Also missing is Mark's quota snapshot deletion fixes.  They were
crashing during btrfs/011 with CONFIG_DEBUG_PAGE_ALLOC on.  We'll get
that nailed down.

integration is subject to rebasing, so please treat it more like a patch
queue.  It is very lightly tested, the goal is just to show which
patches are already applied and which ones are still pending.

Thanks!

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] btrfs: Return right extent when fiemap gives unaligned offset and len.

When page aligned start and len passed to extent_fiemap(), the result is
good, but when start and len is not aligned, e.g. start = 1 and len =
4095 is passed to extent_fiemap(), it returns no extent.

The problem is that start and len is all rounded up which causes the
problem. This patch will round down start and round up (start + len) to
return right extent.

Reported-by: Chandan Rajendra chan...@linux.vnet.ibm.com
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
changelog:
v2: reword the description(ALIGN rounds up, not rounds down).
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a389820..1c70cff 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4213,8 +4213,8 @@ int extent_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
return -ENOMEM;
path-leave_spinning = 1;
 
-   start = ALIGN(start, BTRFS_I(inode)-root-sectorsize);
-   len = ALIGN(len, BTRFS_I(inode)-root-sectorsize);
+   start = round_down(start, BTRFS_I(inode)-root-sectorsize);
+   len = round_up(max, BTRFS_I(inode)-root-sectorsize) - start;
 
/*
 * lookup the last file extent.  We're not using i_size here
-- 
2.0.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix compressed write corruption on enospc


On 07/24/2014 10:48 PM, Liu Bo wrote:

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.

Signed-off-by: Liu Bo bo.li@oracle.com
---
  fs/btrfs/inode.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ea7610 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -709,6 +709,18 @@ retry:
unlock_extent(io_tree, async_extent-start,
  async_extent-start +
  async_extent-ram_size - 1);
+
+   /*
+* we need to redirty the pages if we decide to
+* fallback to uncompressed IO, otherwise we
+* will not submit these pages down to lower
+* layers.
+*/
+   extent_range_redirty_for_io(inode,
+   async_extent-start,
+   async_extent-start +
+   async_extent-ram_size - 1);
+
goto retry;

BTW, if such ENOSPC happens, it means we could not reserve compressed space.
So we retry with nocompression codes, it will try to reserve more space. 
Any reason

we do such things?

Thanks,
Wang

}
goto out_free;


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix compressed write corruption on enospc


On 07/25/2014 10:08 AM, Liu Bo wrote:

On Fri, Jul 25, 2014 at 09:53:43AM +0800, Wang Shilong wrote:

On 07/24/2014 10:48 PM, Liu Bo wrote:

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.

Signed-off-by: Liu Bo bo.li@oracle.com
---
  fs/btrfs/inode.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ea7610 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -709,6 +709,18 @@ retry:
unlock_extent(io_tree, async_extent-start,
  async_extent-start +
  async_extent-ram_size - 1);
+
+   /*
+* we need to redirty the pages if we decide to
+* fallback to uncompressed IO, otherwise we
+* will not submit these pages down to lower
+* layers.
+*/
+   extent_range_redirty_for_io(inode,
+   async_extent-start,
+   async_extent-start +
+   async_extent-ram_size - 1);
+
goto retry;

BTW, if such ENOSPC happens, it means we could not reserve compressed space.
So we retry with nocompression codes, it will try to reserve more
space. Any reason
we do such things?

Compressed extents needs continuous space while uncompressed extents can have
more choices.

Yeah, that is reasonable. Thanks for your answer.^_^



thanks,
-liubo


Thanks,
Wang

}
goto out_free;

.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: integration tree updated