date:20140919

Re: [PATCH] xfstests: remove check_scratch_fs in btrfs/012

2014-09-19 Thread Eryu Guan

On Wed, Sep 03, 2014 at 11:25:59AM +0800, Liu Bo wrote:
 From: Liu Bo liub.li...@gmail.com
 
 btrfs/012 is a case to verify btrfs-convert feature, it converts an ext4 to
 btrfs firstly and do something, then rolls back to ext4.
 
 So at last we have a ext4 on the scratch device, but setting _require_scratch
 will force a btrfsck on a ext4 fs because $FSTYP here is btrfs, and it ends up
 with a failure report of _check_btrfs_filesystem.
 
 Now that we have deliberately check the final ext4 fs in btrfs/012, just do 
 not
 set _require_scratch in this case.
 
 Signed-off-by: Liu Bo liub.li...@gmail.com
 ---
  tests/btrfs/012 | 1 -
  1 file changed, 1 deletion(-)
 
 diff --git a/tests/btrfs/012 b/tests/btrfs/012
 index f7e5da5..12f6462 100755
 --- a/tests/btrfs/012
 +++ b/tests/btrfs/012
 @@ -52,7 +52,6 @@ _cleanup()
  # Modify as appropriate.
  _supported_fs btrfs
  _supported_os Linux
 -_require_scratch

The test still requires a scratch device, so we cannot simply remove
this line. Now we can use _require_scratch_nocheck helper, and it
works fine based on my test.

Thanks,
Eryu

  
  BTRFS_CONVERT_PROG=`set_prog_path btrfs-convert`
  MKFS_EXT4_PROG=`set_prog_path mkfs.ext4`
 -- 
 1.8.1.4
 
 ___
 xfs mailing list
 x...@oss.sgi.com
 http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] btrfs: correct a message on setting nodatacow

2014-09-19 Thread Satoru Takeuchi


Hi Qu,

Thank you for your comment.

(2014/09/19 11:03), Qu Wenruo wrote:


 Original Message 
Subject: [PATCH 2/5] btrfs: correct a message on setting nodatacow
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org
Date: 2014年09月18日 16:28

From: Naohiro Aota na...@elisp.net

If we set nodatacow mount option after compress-force option,
we don't get compression disabling message.

===
$ sudo mount -o remount,compress-force,nodatacow /; dmesg|tail -n 3
[ 3845.719047] BTRFS info (device vda2): force zlib compression
[ 3845.719052] BTRFS info (device vda2): setting nodatacow
[ 3845.719055] BTRFS info (device vda2): disk space caching is enabled
===

Signed-off-by: Naohiro Aota na...@elisp.net
Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
---
  fs/btrfs/super.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d1c5b6d..d131098 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -462,8 +462,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
  break;
  case Opt_nodatacow:
  if (!btrfs_test_opt(root, NODATACOW)) {
-if (!btrfs_test_opt(root, COMPRESS) ||
-!btrfs_test_opt(root, FORCE_COMPRESS)) {
+if (btrfs_test_opt(root, COMPRESS)) {
  btrfs_info(root-fs_info,
 setting nodatacow, compression disabled);
  } else {
-- 1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Although the patch makes the output ok, the core problem is missing conflict 
options check.

compress-force mount options implies datacow and datasum, but following 
nodatasum will disable datasum and compress, in fact they are conflicting mount 
option...

Even the current behavior(later mount option will override previous ones) 
provides great tolerance,
IMO there should better be some conflicting check for mount options.

For example, we first save all the mount options passed in into a temporary 
bitmaps to finds out the conflicting
and only when they contains no conflicts, set the mount options to fs_info.
(Maybe bitmap is not enough for this case, since we can't distinguish default 
value and value to be set?)

What do you think about this idea ?


I'm against your idea for two reasons and it's better to
stay in current behavior though it's a bit complex.

First, the rule last one wins is not only a conventional rule,
but also is what mount(8) says.

https://git.kernel.org/cgit/utils/util-linux/util-linux.git/tree/sys-utils/mount.8#n253

==
The usual behavior is that the last option wins if there are conflicting
ones.
==

Second, if we change the behavior, we would break existing
systems. At worst case, users would fail to boot their system
after updating kernel, because of the failure of mounting
Btrfs at the init process.

Thanks,
Satoru



Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] btrfs: correct a message on setting nodatacow

2014-09-19 Thread Qu Wenruo

 Original Message 
Subject: Re: [PATCH 2/5] btrfs: correct a message on setting nodatacow
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org 
linux-btrfs@vger.kernel.org

Date: 2014年09月19日 14:45

Hi Qu,

Thank you for your comment.

(2014/09/19 11:03), Qu Wenruo wrote:

 Original Message 
Subject: [PATCH 2/5] btrfs: correct a message on setting nodatacow
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org
Date: 2014年09月18日 16:28

From: Naohiro Aota na...@elisp.net

If we set nodatacow mount option after compress-force option,
we don't get compression disabling message.

===
$ sudo mount -o remount,compress-force,nodatacow /; dmesg|tail -n 3
[ 3845.719047] BTRFS info (device vda2): force zlib compression
[ 3845.719052] BTRFS info (device vda2): setting nodatacow
[ 3845.719055] BTRFS info (device vda2): disk space caching is enabled
===

Signed-off-by: Naohiro Aota na...@elisp.net
Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
---
  fs/btrfs/super.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d1c5b6d..d131098 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -462,8 +462,7 @@ int btrfs_parse_options(struct btrfs_root *root, 
char *options)

  break;
  case Opt_nodatacow:
  if (!btrfs_test_opt(root, NODATACOW)) {
-if (!btrfs_test_opt(root, COMPRESS) ||
-!btrfs_test_opt(root, FORCE_COMPRESS)) {
+if (btrfs_test_opt(root, COMPRESS)) {
  btrfs_info(root-fs_info,
 setting nodatacow, compression 
disabled);

  } else {
-- 1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe 
linux-btrfs in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Although the patch makes the output ok, the core problem is missing 
conflict options check.

compress-force mount options implies datacow and datasum, but 
following nodatasum will disable datasum and compress, in fact they 
are conflicting mount option...

Even the current behavior(later mount option will override previous 
ones) provides great tolerance,

IMO there should better be some conflicting check for mount options.

For example, we first save all the mount options passed in into a 
temporary bitmaps to finds out the conflicting
and only when they contains no conflicts, set the mount options to 
fs_info.
(Maybe bitmap is not enough for this case, since we can't distinguish 
default value and value to be set?)

What do you think about this idea ?

I'm against your idea for two reasons and it's better to
stay in current behavior though it's a bit complex.

First, the rule last one wins is not only a conventional rule,
but also is what mount(8) says.

https://git.kernel.org/cgit/utils/util-linux/util-linux.git/tree/sys-utils/mount.8#n253 

==
The usual behavior is that the last option wins if there are conflicting
ones.
==

Second, if we change the behavior, we would break existing
systems. At worst case, users would fail to boot their system
after updating kernel, because of the failure of mounting
Btrfs at the init process.

Thanks,
Satoru

It really makes sense.

So I'm OK to keep things and it's true that the conflict check is 
somewhat overkilled.

Thanks,
Qu

Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe 
linux-btrfs in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map

2014-09-19 Thread Satoru Takeuchi

(2014/09/18 17:55), Qu Wenruo wrote:
 The following commit enhanced the merge_extent_mapping() to reduce
 fragment in extent map tree, but it can't handle case which existing
 lies before map_start:
 51f39 btrfs: Use right extent length when inserting overlap extent map.
 
 [BUG]
 When existing extent map's start is before map_start,
 the em-len will be minus, which will corrupt the extent map and fail to
 insert the new extent map.
 This will happen when someone get a large extent map, but when it is
 going to insert it into extent map tree, some one has already commit
 some write and split the huge extent into small parts.
 
 [REPRODUCER]
 It is very easy to tiger using filebench with randomrw personality.
 It is about 100% to reproduce when using 8G preallocated file in 60s
 randonrw test.
 
 [FIX]
 This patch can now handle any existing extent position.
 Since it does not directly use existing-start, now it will find the
 previous and next extent around map_start.
 So the old existing-start  map_start bug will never happen again.
 
 [ENHANCE]
 This patch will insert the best fitted extent map into extent map tree,
 other than the oldest [map_start, map_start + sectorsize) or the
 relatively newer but not perfect [map_start, existing-start).
 
 The patch will first search existing extent that does not intersects with
 the desired map range [map_start, map_start + len).
 The existing extent will be either before or behind map_start, and based
 on the existing extent, we can find out the previous and next extent
 around map_start.
 
 So the best fitted extent would be [prev-end, next-start).
 For prev or next is not found, em-start would be prev-end and em-end
 wold be next-start.
 
 With this patch, the fragment in extent map tree should be reduced much
 more than the 51f39 commit and reduce an unneeded extent map tree search.
 
 Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com
 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
 Reviewed-by: Liu Bo bo.li@oracle.com

Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

Sorry to late reply. I confirmed this problem happens without
this patch and does not happen with this patch.

Thanks,
Satoru

 ---
 changelog:
 v2:
  Liu Bo points out that the if() use 'start + len = existing-start'
  is not equal to original if(), which may cause problem in no-holes
  mode.
 ---
   fs/btrfs/inode.c | 79 
 
   1 file changed, 57 insertions(+), 22 deletions(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 016c403..b3864b7 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -6191,21 +6191,60 @@ out_fail_inode:
   goto out_fail;
   }
   
 +/* Find next extent map of a given extent map, caller needs to ensure locks 
 */
 +static struct extent_map *next_extent_map(struct extent_map *em)
 +{
 + struct rb_node *next;
 +
 + next = rb_next(em-rb_node);
 + if (!next)
 + return NULL;
 + return container_of(next, struct extent_map, rb_node);
 +}
 +
 +static struct extent_map *prev_extent_map(struct extent_map *em)
 +{
 + struct rb_node *prev;
 +
 + prev = rb_prev(em-rb_node);
 + if (!prev)
 + return NULL;
 + return container_of(prev, struct extent_map, rb_node);
 +}
 +
   /* helper for btfs_get_extent.  Given an existing extent in the tree,
 + * the existing extent is the nearest extent to map_start,
* and an extent that you want to insert, deal with overlap and insert
 - * the new extent into the tree.
 + * the best fitted new extent into the tree.
*/
   static int merge_extent_mapping(struct extent_map_tree *em_tree,
   struct extent_map *existing,
   struct extent_map *em,
   u64 map_start)
   {
 + struct extent_map *prev;
 + struct extent_map *next;
 + u64 start;
 + u64 end;
   u64 start_diff;
   
   BUG_ON(map_start  em-start || map_start = extent_map_end(em));
 - start_diff = map_start - em-start;
 - em-start = map_start;
 - em-len = existing-start - em-start;
 +
 + if (existing-start  map_start) {
 + next = existing;
 + prev = prev_extent_map(next);
 + } else {
 + prev = existing;
 + next = next_extent_map(prev);
 + }
 +
 + start = prev ? extent_map_end(prev) : em-start;
 + start = max_t(u64, start, em-start);
 + end = next ? next-start : extent_map_end(em);
 + end = min_t(u64, end, extent_map_end(em));
 + start_diff = start - em-start;
 + em-start = start;
 + em-len = end - start;
   if (em-block_start  EXTENT_MAP_LAST_BYTE 
   !test_bit(EXTENT_FLAG_COMPRESSED, em-flags)) {
   em-block_start += start_diff;
 @@ -6482,25 +6521,21 @@ insert:
   
   ret = 0;
   
 - existing = lookup_extent_mapping(em_tree, start, len);
 - if (existing  (existing-start

[PATCH 1/4] btrfs: correct empty compression property behavior

2014-09-19 Thread Satoru Takeuchi

From: Naohiro Aota na...@elisp.net

In the current implementation, compression property ==  has
the two different meanings: one is with BTRFS_INODE_NOCOMPRESS,
and the other is without this flag.

So, even if the two files a and b have the same compression
property, , and the same contents, one file seems to be
compressed and the other is not. It's difficult to understand
for users and also confuses them.

Here is the real example. Let assume the following two cases.

  a) A file created freshly (under a directory without both
 COMPRESS and NOCOMPRESS flag.)

  b) A existing file which is explicitly set 
 to compression property.

In addition, here is the command log (I attached the source of
getflags program in this patch.)

===
$ rm -f a b; touch a b
$ btrfs prop set b compression 
 # both a and b have the same compression property: 
$ btrfs prop get a compression
$ btrfs prop get b compression
 # but ... let's take a look at inode flags
$ ./getflags a
0x0
$ ./getflags b
0x400 # 0x400 (FS_NOCOMP_FL) corresponds to BTRFS_INODE_NOCOMPRESS
===

So both these two files have their compression property == ,
but have different NOCOMPRESS flag state leading to different
behavior.

case | BTRFS_INODE_NOCOMPRESS | behavior
=++=
   a | unset  | might be compressed
   b | set| never be compressed

I consider that we should not expect users to remember
whether their files are case a or b and should introduce
another value for compress property anyway.

getflags.c:
===
#include sys/ioctl.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include stdio.h
#include linux/fs.h

int main(int argc, char const* argv[])
{
  const char *name = argv[1];
  int fd = open(name, O_RDONLY);
  long x;
  ioctl(fd, FS_IOC_GETFLAGS, x);
  printf(0x%lx\n, x);
  return 0;
}
===

Signed-off-by: Naohiro Aota na...@elisp.net
Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
---
 fs/btrfs/props.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index 129b1dd..bf005f4 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -393,8 +393,8 @@ static int prop_compression_apply(struct inode *inode,
int type;
 
if (len == 0) {
-   BTRFS_I(inode)-flags |= BTRFS_INODE_NOCOMPRESS;
-   BTRFS_I(inode)-flags = ~BTRFS_INODE_COMPRESS;
+   BTRFS_I(inode)-flags =
+   ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS);
BTRFS_I(inode)-force_compress = BTRFS_COMPRESS_NONE;
 
return 0;
-- 1.8.3.1 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] btrfs: introduce new compression property to disable compression at all

2014-09-19 Thread Satoru Takeuchi

From: Naohiro Aota na...@elisp.net

This new compression property, off, to disable compression of
the file at all.

Signed-off-by: Naohiro Aota na...@elisp.net
Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
---
 fs/btrfs/props.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index bf005f4..38efbe1 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -382,6 +382,8 @@ static int prop_compression_validate(const char *value, 
size_t len)
return 0;
else if (!strncmp(zlib, value, len))
return 0;
+   else if (!strncmp(off, value, len))
+   return 0;
 
return -EINVAL;
 }
@@ -400,6 +402,14 @@ static int prop_compression_apply(struct inode *inode,
return 0;
}
 
+   if (!strncmp(off, value, len)) {
+   BTRFS_I(inode)-flags |= BTRFS_INODE_NOCOMPRESS;
+   BTRFS_I(inode)-flags = ~BTRFS_INODE_COMPRESS;
+   BTRFS_I(inode)-force_compress = BTRFS_COMPRESS_NONE;
+
+   return 0;
+   }
+
if (!strncmp(lzo, value, len))
type = BTRFS_COMPRESS_LZO;
else if (!strncmp(zlib, value, len))
@@ -423,5 +433,8 @@ static const char *prop_compression_extract(struct inode 
*inode)
return lzo;
}
 
+   if (BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS)
+   return off;
+
return NULL;
 }
-- 1.8.3.1 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] btrfs: export __btrfs_set_prop

2014-09-19 Thread Satoru Takeuchi

From: Naohiro Aota na...@elisp.net

Export __btrfs_set_prop() to be able to call it
with running transaction.

Signed-off-by: Naohiro Aota na...@elisp.net
Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
---
 fs/btrfs/props.c | 2 +-
 fs/btrfs/props.h | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index 38efbe1..6f56f5b 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -99,7 +99,7 @@ find_prop_handler(const char *name,
return NULL;
 }
 
-static int __btrfs_set_prop(struct btrfs_trans_handle *trans,
+int __btrfs_set_prop(struct btrfs_trans_handle *trans,
struct inode *inode,
const char *name,
const char *value,
diff --git a/fs/btrfs/props.h b/fs/btrfs/props.h
index 100f188..cff91e0 100644
--- a/fs/btrfs/props.h
+++ b/fs/btrfs/props.h
@@ -28,6 +28,12 @@ int btrfs_set_prop(struct inode *inode,
   const char *value,
   size_t value_len,
   int flags);
+int __btrfs_set_prop(struct btrfs_trans_handle *trans,
+   struct inode *inode,
+   const char *name,
+   const char *value,
+   size_t value_len,
+   int flags);
 
 int btrfs_load_inode_props(struct inode *inode, struct btrfs_path *path);
 
-- 1.8.3.1 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/2] Move BTRFS RCU string to common library

2014-09-19 Thread Omar Sandoval

This patch series moves the generic RCU string library used internally by BTRFS
to be accessible by anyone. It provides printk_in_rcu and
printk_ratelimited_in_rcu to print these strings. In order to avoid a weird
inconsistency between the two, the first patch fixes printk_ratelimited so it
passes on the return value from printk.

The second patch actually moves the RCU string library. Version 2 passes on the
return values from printk{,_ratelimited} and fixes some style issues.

Omar Sandoval (2):
  Return a value from printk_ratelimited
  Move BTRFS RCU string to common library

 fs/btrfs/check-integrity.c |  6 +--
 fs/btrfs/dev-replace.c | 19 +-
 fs/btrfs/disk-io.c |  6 +--
 fs/btrfs/extent_io.c   |  4 +-
 fs/btrfs/ioctl.c   |  4 +-
 fs/btrfs/raid56.c  |  2 +-
 fs/btrfs/rcu-string.h  | 56 
 fs/btrfs/scrub.c   | 15 
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 14 +++
 include/linux/printk.h |  4 +-
 include/linux/rcustring.h  | 91 ++
 12 files changed, 131 insertions(+), 92 deletions(-)
 delete mode 100644 fs/btrfs/rcu-string.h
 create mode 100644 include/linux/rcustring.h

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/2] Move BTRFS RCU string to common library

2014-09-19 Thread Omar Sandoval

The RCU-friendy string API used internally by BTRFS is generic enough for
common use. This doesn't add any new functionality, but instead just moves the
code and documents the existing API.

Signed-off-by: Omar Sandoval osan...@osandov.com
---
 fs/btrfs/check-integrity.c |  6 +--
 fs/btrfs/dev-replace.c | 19 +-
 fs/btrfs/disk-io.c |  6 +--
 fs/btrfs/extent_io.c   |  4 +-
 fs/btrfs/ioctl.c   |  4 +-
 fs/btrfs/raid56.c  |  2 +-
 fs/btrfs/rcu-string.h  | 56 
 fs/btrfs/scrub.c   | 15 
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 14 +++
 include/linux/rcustring.h  | 91 ++
 11 files changed, 128 insertions(+), 91 deletions(-)
 delete mode 100644 fs/btrfs/rcu-string.h
 create mode 100644 include/linux/rcustring.h

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index ce92ae3..4ccd7da 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -94,6 +94,7 @@
 #include linux/mutex.h
 #include linux/genhd.h
 #include linux/blkdev.h
+#include linux/rcustring.h
 #include ctree.h
 #include disk-io.h
 #include hash.h
@@ -103,7 +104,6 @@
 #include print-tree.h
 #include locking.h
 #include check-integrity.h
-#include rcu-string.h
 
 #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1
 #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1
@@ -851,8 +851,8 @@ static int btrfsic_process_superblock_dev_mirror(
printk_in_rcu(KERN_INFO New initial S-block (bdev %p, 
%s)
  @%llu (%s/%llu/%d)\n,
 superblock_bdev,
-rcu_str_deref(device-name), dev_bytenr,
-dev_state-name, dev_bytenr,
+rcu_string_dereference(device-name),
+dev_bytenr, dev_state-name, dev_bytenr,
 superblock_mirror_num);
list_add(superblock_tmp-all_blocks_node,
 state-all_blocks_list);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index eea26e1..87d10cc 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -25,6 +25,7 @@
 #include linux/capability.h
 #include linux/kthread.h
 #include linux/math64.h
+#include linux/rcustring.h
 #include asm/div64.h
 #include ctree.h
 #include extent_map.h
@@ -34,7 +35,6 @@
 #include volumes.h
 #include async-thread.h
 #include check-integrity.h
-#include rcu-string.h
 #include dev-replace.h
 #include sysfs.h
 
@@ -376,9 +376,9 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
printk_in_rcu(KERN_INFO
  BTRFS: dev_replace from %s (devid %llu) to %s started\n,
  src_device-missing ? missing disk :
-   rcu_str_deref(src_device-name),
+ rcu_string_dereference(src_device-name),
  src_device-devid,
- rcu_str_deref(tgt_device-name));
+ rcu_string_dereference(tgt_device-name));
 
tgt_device-total_bytes = src_device-total_bytes;
tgt_device-disk_total_bytes = src_device-disk_total_bytes;
@@ -528,9 +528,10 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
printk_in_rcu(KERN_ERR
  BTRFS: btrfs_scrub_dev(%s, %llu, %s) failed 
%d\n,
  src_device-missing ? missing disk :
-   rcu_str_deref(src_device-name),
+ rcu_string_dereference(src_device-name),
  src_device-devid,
- rcu_str_deref(tgt_device-name), scrub_ret);
+ rcu_string_dereference(tgt_device-name),
+ scrub_ret);
btrfs_dev_replace_unlock(dev_replace);
mutex_unlock(root-fs_info-fs_devices-device_list_mutex);
mutex_unlock(root-fs_info-chunk_mutex);
@@ -544,9 +545,9 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
printk_in_rcu(KERN_INFO
  BTRFS: dev_replace from %s (devid %llu) to %s) 
finished\n,
  src_device-missing ? missing disk :
-   rcu_str_deref(src_device-name),
+ rcu_string_dereference(src_device-name),
  src_device-devid,
- rcu_str_deref(tgt_device-name));
+ rcu_string_dereference(tgt_device-name));
tgt_device-is_tgtdev_for_dev_replace = 0;
tgt_device-devid = src_device-devid;
src_device-devid = BTRFS_DEV_REPLACE_DEVID;
@@ -805,10 +806,10 @@ static int btrfs_dev_replace_kthread(void *data)
printk_in_rcu(KERN_INFO
BTRFS: continuing dev_replace from %s (devid

[PATCH v2 1/2] Return a value from printk_ratelimited

2014-09-19 Thread Omar Sandoval

printk returns an integer; there's no reason for printk_ratelimited to swallow
it.

Signed-off-by: Omar Sandoval osan...@osandov.com
---
 include/linux/printk.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index d78125f..67534bc 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -343,12 +343,14 @@ extern asmlinkage void dump_stack(void) __cold;
 #ifdef CONFIG_PRINTK
 #define printk_ratelimited(fmt, ...)   \
 ({ \
+   int __ret = 0;  \
static DEFINE_RATELIMIT_STATE(_rs,  \
  DEFAULT_RATELIMIT_INTERVAL,   \
  DEFAULT_RATELIMIT_BURST); \
\
if (__ratelimit(_rs))  \
-   printk(fmt, ##__VA_ARGS__); \
+   __ret = printk(fmt, ##__VA_ARGS__); \
+   __ret;  \
 })
 #else
 #define printk_ratelimited(fmt, ...)   \
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] btrfs: Fix compression related ioctl to run atomic operations in one transaction

2014-09-19 Thread Satoru Takeuchi

From: Naohiro Aota na...@elisp.net

Fix the following two problems in compression related ioctl code.

  a) Updating compression flags and updating inode attribute
 in two separated transaction. So, if something bad happens
 after the former, and before the latter, file system
 would become inconsistent state.

 This patch move them into one transaction.

  b) It updates compression flags here and calls btrfs_set_prop()
 after that. However flags are also updated in this function.

 This patch removes the duplicated code for updating flags
 from ioctl code and aggregates this work to __btrfs_set_prop()
 at all. 

Signed-off-by: Naohiro Aota na...@elisp.net
Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
---
 fs/btrfs/ioctl.c | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0ff2127..47ac6da 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -221,6 +221,7 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
u64 ip_oldflags;
unsigned int i_oldflags;
umode_t mode;
+   const char *comp;
 
if (!inode_owner_or_capable(inode))
return -EPERM;
@@ -310,40 +311,29 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
 * things smaller.
 */
if (flags  FS_NOCOMP_FL) {
-   ip-flags = ~BTRFS_INODE_COMPRESS;
-   ip-flags |= BTRFS_INODE_NOCOMPRESS;
-
-   ret = btrfs_set_prop(inode, btrfs.compression, NULL, 0, 0);
-   if (ret  ret != -ENODATA)
-   goto out_drop;
+   comp = off;
} else if (flags  FS_COMPR_FL) {
-   const char *comp;
-
-   ip-flags |= BTRFS_INODE_COMPRESS;
-   ip-flags = ~BTRFS_INODE_NOCOMPRESS;
-
if (root-fs_info-compress_type == BTRFS_COMPRESS_LZO)
comp = lzo;
else
comp = zlib;
-   ret = btrfs_set_prop(inode, btrfs.compression,
-comp, strlen(comp), 0);
-   if (ret)
-   goto out_drop;
-
} else {
-   ret = btrfs_set_prop(inode, btrfs.compression, NULL, 0, 0);
-   if (ret  ret != -ENODATA)
-   goto out_drop;
-   ip-flags = ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS);
+   comp = ;
}
 
-   trans = btrfs_start_transaction(root, 1);
+   trans = btrfs_start_transaction(root, 2);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
goto out_drop;
}
 
+   ret = __btrfs_set_prop(trans, inode, btrfs.compression, comp,
+  strlen(comp), 0);
+   if (ret  ret != -ENODATA) {
+   btrfs_end_transaction(trans, root);
+   goto out_drop;
+   }
+
btrfs_update_iflags(inode);
inode_inc_iversion(inode);
inode-i_ctime = CURRENT_TIME;
-- 1.8.3.1 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Help for creating a useful bugreport

2014-09-19 Thread Jakob Breier


Hi,

my btrfs partition got corrupted. With some trouble I got most of the 
valuable data out of it using `btrfs restore -i` (it crashed a few 
times, but on the fourth or fifth run it reached the stuff I wanted to 
recover). As far as I can tell, the file system broke during normal 
operations without any hardware failures. Before I switch back to ext4, 
I'd like to file a bug report so my troubles were not completely in 
vain. Unfortunately I don't have much to work with. Can you help me with 
extracting enough information to create a useful bugreport?


Regards,
Jakob

$ cat /etc/fedora-release
Fedora release 20 (Heisenbug)

$ uname -a
Linux localhost.localdomain 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8 
11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


$ sudo btrfs fi show /dev/dm-1
Label: 'EncJakobExtern'  uuid: 8ccbd085-564d-4022-bfc9-18fd429d0a8d
Total devices 1 FS bytes used 731.07GiB
devid1 size 931.60GiB used 769.04GiB path 
/dev/mapper/luks-a266c492-2360-404d-9ad7-00edc2f0c09d


Btrfs v3.16

$ sudo mount /dev/dm-1 /mnt/diverse/
mount: wrong fs type, bad option, bad superblock on 
/dev/mapper/luks-a266c492-2360-404d-9ad7-00edc2f0c09d,

   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

$ sudo mount /dev/dm-1 /mnt/diverse/ -o recovery
mount: wrong fs type, bad option, bad superblock on 
/dev/mapper/luks-a266c492-2360-404d-9ad7-00edc2f0c09d,

   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.


*Syslog:*
Sep 19 10:11:11 localhost.localdomain kernel: BTRFS: device label 
EncJakobExtern devid 1 transid 4923 /dev/dm-1
Sep 19 10:11:11 localhost.localdomain udisksd[1971]: Unlocked LUKS 
device /dev/sdb1 as /dev/dm-1


Sep 19 10:16:13 localhost.localdomain sudo[5080]: jakob : TTY=pts/6 ; 
PWD=/home/jakob ; USER=root ; COMMAND=/bin/mount /dev/dm-1 /mnt/diverse/
Sep 19 10:16:17 localhost.localdomain kernel: BTRFS info (device dm-1): 
disk space caching is enabled
Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify 
failed on 46678016 wanted 4923 found 3306
Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify 
failed on 46678016 wanted 4923 found 3306
Sep 19 10:16:18 localhost.localdomain kernel: BTRFS: Failed to read 
block groups: -5

Sep 19 10:16:18 localhost.localdomain kernel: BTRFS: open_ctree failed

Sep 19 10:16:45 localhost.localdomain sudo[5456]: jakob : TTY=pts/6 ; 
PWD=/home/jakob ; USER=root ; COMMAND=/bin/mount /dev/dm-1 /mnt/diverse/ 
-o recovery
Sep 19 10:16:45 localhost.localdomain kernel: BTRFS info (device dm-1): 
enabling auto recovery
Sep 19 10:16:45 localhost.localdomain kernel: BTRFS info (device dm-1): 
disk space caching is enabled
Sep 19 10:16:46 localhost.localdomain kernel: parent transid verify 
failed on 46678016 wanted 4923 found 3306
Sep 19 10:16:46 localhost.localdomain kernel: parent transid verify 
failed on 46678016 wanted 4923 found 3306
Sep 19 10:16:46 localhost.localdomain kernel: BTRFS: Failed to read 
block groups: -5

Sep 19 10:16:46 localhost.localdomain kernel: BTRFS: open_ctree failed


*fstab entry with which it was usually mounted:*
/dev/mapper/LuksOpenendEncJakobExtern /mnt/EncJakobExtern btrfs 
compress=lzo,nofail 0 0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Revert Btrfs: device_list_add() should not update list when

2014-09-19 Thread Anand Jain




  Looks good to me Chris. Thank you.


Reviewed-by: Anand Jain anand.j...@oracle.com


On 09/18/2014 11:00 PM, Chris Mason wrote:


Johannes and Sam, could you please confirm this patch fixes your mount
regression for now?  Anand, please make sure I kept the generation check
properly.

This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.

This commit is triggering failures to mount by subvolume id in some
configurations.  The main problem is how many different ways this
scanning function is used, both for scanning while mounted and
unmounted.  A proper cleanup is too big for late rcs.

For now, just revert the commit and we'll put a better fix into a later
merge window.

Signed-off-by: Chris Mason c...@fb.com
---
  fs/btrfs/volumes.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 340a92d..2c2d6d1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -529,12 +529,12 @@ static noinline int device_list_add(const char *path,
 */

/*
-* As of now don't allow update to btrfs_fs_device through
-* the btrfs dev scan cli, after FS has been mounted.
+* For now, we do allow update to btrfs_fs_device through the
+* btrfs dev scan cli after FS has been mounted.  We're still
+* tracking a problem where systems fail mount by subvolume id
+* when we reject replacement on a mounted FS.
 */
-   if (fs_devices-opened) {
-   return -EBUSY;
-   } else {
+   if (!fs_devices-opened  found_transid  device-generation) {
/*
 * That is if the FS is _not_ mounted and if you
 * are here, that means there is more than one
@@ -542,8 +542,7 @@ static noinline int device_list_add(const char *path,
 * with larger generation number or the last-in if
 * generation are equal.
 */
-   if (found_transid  device-generation)
-   return -EEXIST;
+   return -EEXIST;
}

name = rcu_string_strdup(path, GFP_NOFS);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Performance Issues

2014-09-19 Thread Rob Spanton

Hi,

I have a particularly uncomplicated setup (a desktop PC with a hard
disk) and I'm seeing particularly slow performance from btrfs.  A `git
status` in the linux source tree takes about 46 seconds after dropping
caches, whereas on other machines using ext4 this takes about 13s.  My
mail client (evolution) also seems to perform particularly poorly on
this setup, and my hunch is that it's spending a lot of time waiting on
the filesystem.

I've tried mounting with noatime, and this has had no effect.  Anyone
got any ideas?  

Here are the things that the wiki page asked for [1]:

uname -a:

Linux zarniwoop.blob 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8
11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version:

Btrfs v3.16

btrfs fi show:

Label: 'fedora'  uuid: 717c0a1b-815c-4e6a-86c0-60b921e84d75
Total devices 1 FS bytes used 1.49TiB
devid1 size 2.72TiB used 1.50TiB path /dev/sda4

Btrfs v3.16

btrfs fi df /:

Data, single: total=1.48TiB, used=1.48TiB
System, DUP: total=32.00MiB, used=208.00KiB
Metadata, DUP: total=11.50GiB, used=10.43GiB
unknown, single: total=512.00MiB, used=0.00

dmesg dump is attached.

Please CC any responses to me, as I'm not subscribed to the list.

Cheers,

Rob

[1] https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list


[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.16.2-200.fc20.x86_64 (mockbu...@bkernel01.phx2.fedoraproject.org) (gcc version 4.8.3 20140624 (Red Hat 4.8.3-1) (GCC) ) #1 SMP Mon Sep 8 11:54:45 UTC 2014
[0.00] Command line: BOOT_IMAGE=/vmlinuz-3.16.2-200.fc20.x86_64 root=UUID=717c0a1b-815c-4e6a-86c0-60b921e84d75 ro rootflags=subvol=root vconsole.font=latarcyrheb-sun16 rhgb quiet LANG=en_GB.UTF-8
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009dbff] usable
[0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xdfed] usable
[0.00] BIOS-e820: [mem 0xdfee-0xdfee2fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdfee3000-0xdfee] ACPI data
[0.00] BIOS-e820: [mem 0xdfef-0xdfef] reserved
[0.00] BIOS-e820: [mem 0xf000-0xf3ff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00019fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.4 present.
[0.00] DMI: Gigabyte Technology Co., Ltd. P35-S3G/P35-S3G, BIOS F5 06/19/2009
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x1a max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-CDFFF write-protect
[0.00]   CE000-E uncachable
[0.00]   F-F write-through
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 0 mask F write-back
[0.00]   1 base 0E000 mask FE000 uncachable
[0.00]   2 base 1 mask F write-back
[0.00]   3 base 1C000 mask FC000 uncachable
[0.00]   4 base 1A000 mask FE000 uncachable
[0.00]   5 base 0DFF0 mask 0 uncachable
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[0.00] e820: update [mem 0xdff0-0x] usable == reserved
[0.00] e820: last_pfn = 0xdfee0 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000f50d0-0x000f50df] mapped at [880f50d0]
[0.00] Base memory trampoline at [88097000] 97000 size 24576
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] BRK [0x02004000, 0x02004fff] PGTABLE
[0.00] BRK [0x02005000, 0x02005fff] PGTABLE
[0.00] BRK [0x02006000, 0x02006fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x19fe0-0x19fff]
[0.00]  [mem 0x19fe0-0x19fff] page 2M
[0.00] BRK [0x02007000, 0x02007fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x19c00-0x19fdf]
[0.00]  [mem 0x19c00-0x19fdf] page 2M
[0.00] init_memory_mapping: [mem 0x18000-0x19bff]
[0.00]  [mem 0x18000-0x19bff] page 2M
[0.00]

Re: Performance Issues

2014-09-19 Thread Swâmi Petaramesh

Le vendredi 19 septembre 2014, 13:18:34 Rob Spanton a écrit :
 I have a particularly uncomplicated setup (a desktop PC with a hard
 disk) and I'm seeing particularly slow performance from btrfs.

Weeelll I have the same over-complicated kind of setup, and an Arch Linux 
BTRFS system which used to boot in some decent amout of time in the past now 
takes about 5 full minutes to just make it to the KDM login prompt, and 
another 5 minutes before KDE is fully started. Makes me think of the good ole' 
times of Windows 95 OSR2 on a 486SX with a dying 1 GB Hard disk...

Now, let me add that I had removed all snaphots, ran a full defrag, and even 
rebalanced the damn thing without any positive effect...

(And yes, my HD is physically in good shape, SMART feels fully happy, and it's 
less than 75% full...)

I've been using BTRFS for 2-3 years on a dozen of different systems, and if 
something doesn't surprise me at all, it's « slow performance », indeed, 
although I'm myself more accustomed to « incredibly fscking damn slow 
performance »...

HTH

-- 
Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E

Un homme ne doit pas avaler plus de bobards qu'il ne peut en digérer.
-- Henry Brooks Adams

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Austin S Hemmelgarn


On 2014-09-19 08:18, Rob Spanton wrote:

Hi,

I have a particularly uncomplicated setup (a desktop PC with a hard
disk) and I'm seeing particularly slow performance from btrfs.  A `git
status` in the linux source tree takes about 46 seconds after dropping
caches, whereas on other machines using ext4 this takes about 13s.  My
mail client (evolution) also seems to perform particularly poorly on
this setup, and my hunch is that it's spending a lot of time waiting on
the filesystem.

I've tried mounting with noatime, and this has had no effect.  Anyone
got any ideas?

Here are the things that the wiki page asked for [1]:

uname -a:

 Linux zarniwoop.blob 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8
 11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version:

 Btrfs v3.16

btrfs fi show:

 Label: 'fedora'  uuid: 717c0a1b-815c-4e6a-86c0-60b921e84d75
Total devices 1 FS bytes used 1.49TiB
devid1 size 2.72TiB used 1.50TiB path /dev/sda4

 Btrfs v3.16

btrfs fi df /:

 Data, single: total=1.48TiB, used=1.48TiB
 System, DUP: total=32.00MiB, used=208.00KiB
 Metadata, DUP: total=11.50GiB, used=10.43GiB
 unknown, single: total=512.00MiB, used=0.00

dmesg dump is attached.

Please CC any responses to me, as I'm not subscribed to the list.

Cheers,

Rob

[1] https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list


WRT the performance of Evolution, the issue is probably fragmentation of 
the data files.  If you run the command:

# btrfs fi defrag -rv /home
you should see some improvement in evolution performance (until you get 
any new mail that is).  Evolution (like most graphical e-mail clients 
these days) uses sqlite for data storage, and sqlite database files are 
one of the known pathological cases for COW filesystems in general; the 
solution is to mark the files as NOCOW (see the info about VM images in 
[1] and [2], the same suggestions apply to database files).


As for git, I haven't seen any performance issues specific to BTRFS; are 
you using any compress= mount option? zlib based compression is known to 
cause serious slowdowns.  I don't think that git uses any kind of 
database for data storage.  Also, if the performance comparison is from 
other systems, unless those systems have the EXACT same hardware 
configuration, they aren't really a good comparison.  Unless the pc this 
is on is a relatively recent system (less than a year or two old), it 
may just be hardware that is the performance bottleneck.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: Performance Issues

2014-09-19 Thread Austin S Hemmelgarn


On 2014-09-19 08:25, Swâmi Petaramesh wrote:

Le vendredi 19 septembre 2014, 13:18:34 Rob Spanton a écrit :

I have a particularly uncomplicated setup (a desktop PC with a hard
disk) and I'm seeing particularly slow performance from btrfs.


Weeelll I have the same over-complicated kind of setup, and an Arch Linux
BTRFS system which used to boot in some decent amout of time in the past now
takes about 5 full minutes to just make it to the KDM login prompt, and
another 5 minutes before KDE is fully started. Makes me think of the good ole'
times of Windows 95 OSR2 on a 486SX with a dying 1 GB Hard disk...
Well, part of your problem might be KDE itself, it's extremely CPU 
intensive these days.  I'd suggest disabling the 'semantic desktop' 
stuff, because that tends to be the worst offender as far as soaking up 
system resources.  Also, if you recently switched to systemd, that may 
be causing some slowdown as well (journald's default settings are 
terrible for performance)


Now, let me add that I had removed all snaphots, ran a full defrag, and even
rebalanced the damn thing without any positive effect...

(And yes, my HD is physically in good shape, SMART feels fully happy, and it's
less than 75% full...)

I've been using BTRFS for 2-3 years on a dozen of different systems, and if
something doesn't surprise me at all, it's « slow performance », indeed,
although I'm myself more accustomed to « incredibly fscking damn slow
performance »...
It's kind of funny, but I haven't had any performance issues with BTRFS 
since about 3.10, even on the systems my employer is using Fedora 20 on, 
and those use only a Core 2 Duo Processor, DDR2-800 RAM, and SATA2 hard 
drives.

HTH






smime.p7s
Description: S/MIME Cryptographic Signature

Re: Performance Issues

2014-09-19 Thread Austin S Hemmelgarn


On 2014-09-19 08:49, Austin S Hemmelgarn wrote:

On 2014-09-19 08:18, Rob Spanton wrote:

Hi,

I have a particularly uncomplicated setup (a desktop PC with a hard
disk) and I'm seeing particularly slow performance from btrfs.  A `git
status` in the linux source tree takes about 46 seconds after dropping
caches, whereas on other machines using ext4 this takes about 13s.  My
mail client (evolution) also seems to perform particularly poorly on
this setup, and my hunch is that it's spending a lot of time waiting on
the filesystem.

I've tried mounting with noatime, and this has had no effect.  Anyone
got any ideas?

Here are the things that the wiki page asked for [1]:

uname -a:

 Linux zarniwoop.blob 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8
 11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version:

 Btrfs v3.16

btrfs fi show:

 Label: 'fedora'  uuid: 717c0a1b-815c-4e6a-86c0-60b921e84d75
 Total devices 1 FS bytes used 1.49TiB
 devid1 size 2.72TiB used 1.50TiB path /dev/sda4

 Btrfs v3.16

btrfs fi df /:

 Data, single: total=1.48TiB, used=1.48TiB
 System, DUP: total=32.00MiB, used=208.00KiB
 Metadata, DUP: total=11.50GiB, used=10.43GiB
 unknown, single: total=512.00MiB, used=0.00

dmesg dump is attached.

Please CC any responses to me, as I'm not subscribed to the list.

Cheers,

Rob

[1] https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list



WRT the performance of Evolution, the issue is probably fragmentation of
the data files.  If you run the command:
# btrfs fi defrag -rv /home
you should see some improvement in evolution performance (until you get
any new mail that is).  Evolution (like most graphical e-mail clients
these days) uses sqlite for data storage, and sqlite database files are
one of the known pathological cases for COW filesystems in general; the
solution is to mark the files as NOCOW (see the info about VM images in
[1] and [2], the same suggestions apply to database files).

As for git, I haven't seen any performance issues specific to BTRFS; are
you using any compress= mount option? zlib based compression is known to
cause serious slowdowns.  I don't think that git uses any kind of
database for data storage.  Also, if the performance comparison is from
other systems, unless those systems have the EXACT same hardware
configuration, they aren't really a good comparison.  Unless the pc this
is on is a relatively recent system (less than a year or two old), it
may just be hardware that is the performance bottleneck.


Realized after I sent this that I forgot the links for [1] and [2]

[1] https://btrfs.wiki.kernel.org/index.php/UseCases
[2] https://btrfs.wiki.kernel.org/index.php/FAQ



smime.p7s
Description: S/MIME Cryptographic Signature

Re: kernel integration branch updated

2014-09-19 Thread Chris Mason

On 09/18/2014 09:45 PM, Qu Wenruo wrote:
 Hi Chris,
 
 
 I'm sorry that the commit 'btrfs: Fix and enhance merge_extent_mapping()
 to insert best fitted extent map'
 has a V2 patch, so the one in tree is not up-to-data.
 
 Although the v2 change is quite small and it's relevantly dependent, so
 it should not be a pain change.

Thanks, please send the v2 as an incremental.  We'll send both to stable.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Holger Hoffstätte

On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:

 I have a particularly uncomplicated setup (a desktop PC with a hard
 disk) and I'm seeing particularly slow performance from btrfs.  A `git
 status` in the linux source tree takes about 46 seconds after dropping
 caches, whereas on other machines using ext4 this takes about 13s.  My

This is - unfortunately - a particular btrfs oddity/characteristic/flaw,
whatever you want to call it. git relies a lot on fast stat() calls,
and those seem to be particularly slow with btrfs esp. on rotational
media. I have the same problem with rsync on a freshly mounted volume;
it gets fast (quite so!) after the first run.

The simplest thing to fix this is a du -s /dev/null to pre-cache all
file inodes.

I'd also love a technical explanation why this happens and how it could
be fixed. Maybe it's just a consequence of how the metadata tree(s)
are laid out on disk.

 I've tried mounting with noatime, and this has had no effect.  Anyone
 got any ideas?  

Don't drop the caches :-)

-h

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Revert Btrfs: device_list_add() should not update list when

2014-09-19 Thread Sam Thursfield


On 18/09/14 16:00, Chris Mason wrote:


Johannes and Sam, could you please confirm this patch fixes your mount
regression for now?  Anand, please make sure I kept the generation check
properly.


I've just tested this patch on top of 3.17-rc5 and it fixes the issue 
for me.


Thanks!
Sam


--
Sam Thursfield, Codethink Ltd.
Office telephone: +44 161 236 5575
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Holger Hoffstätte


On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:

 I have a particularly uncomplicated setup (a desktop PC with a hard
 disk) and I'm seeing particularly slow performance from btrfs.  A `git
 status` in the linux source tree takes about 46 seconds after dropping
 caches, whereas on other machines using ext4 this takes about 13s.  My
 mail client (evolution) also seems to perform particularly poorly on
 this setup, and my hunch is that it's spending a lot of time waiting on
 the filesystem.

This is - unfortunately - a particular btrfs oddity/characteristic/flaw,
whatever you want to call it. git relies a lot on fast stat() calls,
and those seem to be particularly slow with btrfs esp. on rotational
media. I have the same problem with rsync on a freshly mounted volume;
it gets fast (quite so!) after the first run.

The simplest thing to fix this is a du -s /dev/null to pre-cache all
file inodes.

I'd also love a technical explanation why this happens and how it could
be fixed. Maybe it's just a consequence of how the metadata tree(s)
are laid out on disk.

 I've tried mounting with noatime, and this has had no effect.  Anyone
 got any ideas?  

Don't drop the caches :-)

-h

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: cleanup error handling in build_backref_tree

2014-09-19 Thread Josef Bacik

When balance panics it tends to panic in the

BUG_ON(!upper-checked);

test, because it means it couldn't build the backref tree properly.  This is
annoying to users and frankly a recoverable error, nothing in this function is
actually fatal since it is just an in-memory building of the backrefs for a
given bytenr.  So go through and change all the BUG_ON()'s to ASSERT()'s, and
fix the BUG_ON(!upper-checked) thing to just return an error.

This patch also fixes the error handling so it tears down the work we've done
properly.  This code was horribly broken since we always just panic'ed instead
of actually erroring out, so it needed to be completely re-worked.  With this
patch my broken image no longer panics when I mount it.  Thanks,

Signed-off-by: Josef Bacik jba...@fb.com
---
 fs/btrfs/relocation.c | 88 ++-
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 2d221c4..19726af 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -736,7 +736,8 @@ again:
err = ret;
goto out;
}
-   BUG_ON(!ret || !path1-slots[0]);
+   ASSERT(ret);
+   ASSERT(path1-slots[0]);
 
path1-slots[0]--;
 
@@ -746,10 +747,10 @@ again:
 * the backref was added previously when processing
 * backref of type BTRFS_TREE_BLOCK_REF_KEY
 */
-   BUG_ON(!list_is_singular(cur-upper));
+   ASSERT(list_is_singular(cur-upper));
edge = list_entry(cur-upper.next, struct backref_edge,
  list[LOWER]);
-   BUG_ON(!list_empty(edge-list[UPPER]));
+   ASSERT(list_empty(edge-list[UPPER]));
exist = edge-node[UPPER];
/*
 * add the upper level block to pending list if we need
@@ -831,7 +832,7 @@ again:
cur-cowonly = 1;
}
 #else
-   BUG_ON(key.type == BTRFS_EXTENT_REF_V0_KEY);
+   ASSERT(key.type != BTRFS_EXTENT_REF_V0_KEY);
if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) {
 #endif
if (key.objectid == key.offset) {
@@ -840,7 +841,7 @@ again:
 * backref of this type.
 */
root = find_reloc_root(rc, cur-bytenr);
-   BUG_ON(!root);
+   ASSERT(root);
cur-root = root;
break;
}
@@ -868,7 +869,7 @@ again:
} else {
upper = rb_entry(rb_node, struct backref_node,
 rb_node);
-   BUG_ON(!upper-checked);
+   ASSERT(upper-checked);
INIT_LIST_HEAD(edge-list[UPPER]);
}
list_add_tail(edge-list[LOWER], cur-upper);
@@ -892,7 +893,7 @@ again:
 
if (btrfs_root_level(root-root_item) == cur-level) {
/* tree root */
-   BUG_ON(btrfs_root_bytenr(root-root_item) !=
+   ASSERT(btrfs_root_bytenr(root-root_item) ==
   cur-bytenr);
if (should_ignore_root(root))
list_add(cur-list, useless);
@@ -927,7 +928,7 @@ again:
need_check = true;
for (; level  BTRFS_MAX_LEVEL; level++) {
if (!path2-nodes[level]) {
-   BUG_ON(btrfs_root_bytenr(root-root_item) !=
+   ASSERT(btrfs_root_bytenr(root-root_item) ==
   lower-bytenr);
if (should_ignore_root(root))
list_add(lower-list, useless);
@@ -982,7 +983,7 @@ again:
} else {
upper = rb_entry(rb_node, struct backref_node,
 rb_node);
-   BUG_ON(!upper-checked);
+   ASSERT(upper-checked);
INIT_LIST_HEAD(edge-list[UPPER]);
if (!upper-owner)
upper-owner = btrfs_header_owner(eb);
@@ -1026,7 +1027,7 @@ next:
 * everything goes well, connect backref nodes and insert backref nodes
 * into the cache.
 */
-   BUG_ON(!node-checked);
+   ASSERT(node-checked);
cowonly = node-cowonly;
if (!cowonly) {
rb_node = tree_insert(cache-rb_root, node-bytenr,
@@ -1062,8 +1063,21 @@ next:

Re: Performance Issues

2014-09-19 Thread Austin S Hemmelgarn


On 2014-09-19 09:51, Holger Hoffstätte wrote:


On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:


I have a particularly uncomplicated setup (a desktop PC with a hard
disk) and I'm seeing particularly slow performance from btrfs.  A `git
status` in the linux source tree takes about 46 seconds after dropping
caches, whereas on other machines using ext4 this takes about 13s.  My
mail client (evolution) also seems to perform particularly poorly on
this setup, and my hunch is that it's spending a lot of time waiting on
the filesystem.


This is - unfortunately - a particular btrfs oddity/characteristic/flaw,
whatever you want to call it. git relies a lot on fast stat() calls,
and those seem to be particularly slow with btrfs esp. on rotational
media. I have the same problem with rsync on a freshly mounted volume;
it gets fast (quite so!) after the first run.
I find that kind of funny, because regardless of filesystem, stat() is 
one of the *slowest* syscalls on almost every *nix system in existence.


The simplest thing to fix this is a du -s /dev/null to pre-cache all
file inodes.

I'd also love a technical explanation why this happens and how it could
be fixed. Maybe it's just a consequence of how the metadata tree(s)
are laid out on disk.
While I don't know for certain, I think it's largely just a side effect 
of the lack of performance tuning in the BTRFS code.



I've tried mounting with noatime, and this has had no effect.  Anyone
got any ideas?


Don't drop the caches :-)

-h






smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH] xfstests: remove check_scratch_fs in btrfs/012

2014-09-19 Thread Josef Bacik


On 09/02/2014 11:25 PM, Liu Bo wrote:

From: Liu Bo liub.li...@gmail.com

btrfs/012 is a case to verify btrfs-convert feature, it converts an ext4 to
btrfs firstly and do something, then rolls back to ext4.

So at last we have a ext4 on the scratch device, but setting _require_scratch
will force a btrfsck on a ext4 fs because $FSTYP here is btrfs, and it ends up
with a failure report of _check_btrfs_filesystem.

Now that we have deliberately check the final ext4 fs in btrfs/012, just do not
set _require_scratch in this case.

Signed-off-by: Liu Bo liub.li...@gmail.com


I sent a patch for this already, it's on the fs-tests list.  Thanks,

Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Josef Bacik


On 09/19/2014 08:18 AM, Rob Spanton wrote:

Hi,

I have a particularly uncomplicated setup (a desktop PC with a hard
disk) and I'm seeing particularly slow performance from btrfs.  A `git
status` in the linux source tree takes about 46 seconds after dropping
caches, whereas on other machines using ext4 this takes about 13s.  My
mail client (evolution) also seems to perform particularly poorly on
this setup, and my hunch is that it's spending a lot of time waiting on
the filesystem.



Weird, I get the exact opposite performance.  Anyway it's probably 
because of your file layouts, try defragging your git dir and see if 
that helps.  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/2] Move BTRFS RCU string to common library

2014-09-19 Thread Paul E. McKenney

On Fri, Sep 19, 2014 at 02:01:28AM -0700, Omar Sandoval wrote:
 This patch series moves the generic RCU string library used internally by 
 BTRFS
 to be accessible by anyone. It provides printk_in_rcu and
 printk_ratelimited_in_rcu to print these strings. In order to avoid a weird
 inconsistency between the two, the first patch fixes printk_ratelimited so it
 passes on the return value from printk.
 
 The second patch actually moves the RCU string library. Version 2 passes on 
 the
 return values from printk{,_ratelimited} and fixes some style issues.
 
 Omar Sandoval (2):

For the series:

Acked-by: Paul E. McKenney paul...@linux.vnet.ibm.com

   Return a value from printk_ratelimited
   Move BTRFS RCU string to common library
 
  fs/btrfs/check-integrity.c |  6 +--
  fs/btrfs/dev-replace.c | 19 +-
  fs/btrfs/disk-io.c |  6 +--
  fs/btrfs/extent_io.c   |  4 +-
  fs/btrfs/ioctl.c   |  4 +-
  fs/btrfs/raid56.c  |  2 +-
  fs/btrfs/rcu-string.h  | 56 
  fs/btrfs/scrub.c   | 15 
  fs/btrfs/super.c   |  2 +-
  fs/btrfs/volumes.c | 14 +++
  include/linux/printk.h |  4 +-
  include/linux/rcustring.h  | 91 
 ++
  12 files changed, 131 insertions(+), 92 deletions(-)
  delete mode 100644 fs/btrfs/rcu-string.h
  create mode 100644 include/linux/rcustring.h
 
 -- 
 2.1.0
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/2] Move BTRFS RCU string to common library

2014-09-19 Thread Paul E. McKenney

On Fri, Sep 19, 2014 at 11:47:53AM -0400, Chris Mason wrote:
 
 
 On 09/19/2014 11:45 AM, Paul E. McKenney wrote:
  On Fri, Sep 19, 2014 at 02:01:28AM -0700, Omar Sandoval wrote:
  This patch series moves the generic RCU string library used internally by 
  BTRFS
  to be accessible by anyone. It provides printk_in_rcu and
  printk_ratelimited_in_rcu to print these strings. In order to avoid a weird
  inconsistency between the two, the first patch fixes printk_ratelimited so 
  it
  passes on the return value from printk.
 
  The second patch actually moves the RCU string library. Version 2 passes 
  on the
  return values from printk{,_ratelimited} and fixes some style issues.
 
  Omar Sandoval (2):
  
  For the series:
  
  Acked-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 
 Fine by me too, Paul, do you want to merge it in?

I would be happy to.

Are you thinking in terms of 3.18 or 3.19?  These look OK either way, but
thought I should check.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/2] Move BTRFS RCU string to common library

2014-09-19 Thread Chris Mason

On 09/19/2014 12:05 PM, Paul E. McKenney wrote:
 On Fri, Sep 19, 2014 at 11:47:53AM -0400, Chris Mason wrote:


 On 09/19/2014 11:45 AM, Paul E. McKenney wrote:
 On Fri, Sep 19, 2014 at 02:01:28AM -0700, Omar Sandoval wrote:
 This patch series moves the generic RCU string library used internally by 
 BTRFS
 to be accessible by anyone. It provides printk_in_rcu and
 printk_ratelimited_in_rcu to print these strings. In order to avoid a weird
 inconsistency between the two, the first patch fixes printk_ratelimited so 
 it
 passes on the return value from printk.

 The second patch actually moves the RCU string library. Version 2 passes 
 on the
 return values from printk{,_ratelimited} and fixes some style issues.

 Omar Sandoval (2):

 For the series:

 Acked-by: Paul E. McKenney paul...@linux.vnet.ibm.com

 Fine by me too, Paul, do you want to merge it in?
 
 I would be happy to.
 
 Are you thinking in terms of 3.18 or 3.19?  These look OK either way, but
 thought I should check.

Either way is fine with me.  Actually this will have minor conflicts
with my current branch headed for-next, so I can resolve and send as a
stand alone pull.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Holger Hoffstätte

On Fri, 19 Sep 2014 10:53:03 -0400, Austin S Hemmelgarn wrote:

 I find that kind of funny, because regardless of filesystem, stat() is 
 one of the *slowest* syscalls on almost every *nix system in existence.

Sure. I didn't mean to imply that stat() in its various incarnations is
fast by itself, just that git relies a lot on it since it necessarily needs
to look at every file.

-h

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Rob Spanton

Hi,

Thanks for the response everyone.

I wrote:
 I have a particularly uncomplicated setup (a desktop PC with a hard
 disk) and I'm seeing particularly slow performance from btrfs.  A `git
 status` in the linux source tree takes about 46 seconds after dropping
 caches, whereas on other machines using ext4 this takes about 13s.  My
 mail client (evolution) also seems to perform particularly poorly on
 this setup, and my hunch is that it's spending a lot of time waiting on
 the filesystem.

The evolution problem has been improved: the sqlite db that it was using
had over 18000 fragments, so I got evolution to recreate that file with
nocow set.  It now takes only 30s to load my mail rather than 80s,
which is better...

On Fri, 2014-09-19 at 11:05 -0400, Josef Bacik wrote:
 Weird, I get the exact opposite performance.  Anyway it's probably 
 because of your file layouts, try defragging your git dir and see if 
 that helps.  Thanks,

Defragging has improved matters a bit: it now takes 26s (was 46s) to run
git status.  Still not amazing, but at the moment I have no evidence to
suggest that it's not something to do with the machine's hardware.  If I
get time over the weekend I'll dig out an external hard disk and try a
couple of benchmarks with that.

For reference, these are the mount flags:
/dev/sda4 on / type btrfs (rw,noatime,space_cache)
/dev/sda4 on /home type btrfs (rw,noatime,space_cache)

Cheers,

Rob



signature.asc
Description: This is a digitally signed message part

Re: Problem with unmountable filesystem.

2014-09-19 Thread Chris Murphy

Possibly btrfs-select-super can do some of the things I was doing the hard way. 
It's possible to select a super to overwrite other supers, even if they're 
good ones. Whereas btrfs rescue super-recover won't do that, and neither will 
btrfsck, hence why I corrupted the one I didn't want first. This command isn't 
built by default (at least not on Fedora).

Chris--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help for creating a useful bugreport

2014-09-19 Thread Chris Murphy


On Sep 19, 2014, at 2:58 AM, Jakob Breier jakob.bre...@rwth-aachen.de wrote:
 Unfortunately I don't have much to work with. Can you help me with extracting 
 enough information to create a useful bugreport?

What storage device(s)?

Include results from
# btrfs check

And also a note whether you get different results with -s1, -s2, -s3 (how many 
backups superblocks you have depends on file system size so some of those might 
not work).

Since it won't mount you can't get fi df, but if you can provide that info so 
we know if, e.g. the metadata is single (by default on SSD) or DUP.

Was it created with btrfs-progs 3.16, and has it only been written to with 
kernel 3.16 or other kernels also?

If you can use btrfs-image per the wiki, and keep the image around, it might 
come in handy for a Btrfs developer.


 
 Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify failed on 
 46678016 wanted 4923 found 3306
 Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify failed on 
 46678016 wanted 4923 found 3306

These messages come up often on the list. The notes written in disk-io.c say 
this:
 * we can't consider a given block up to date unless the transid of the
 * block matches the transid in the parent node's pointer.  This is how we
 * detect blocks that either didn't get written at all or got written
 * in the wrong place.

I don't know whether this definitely means hardware related problems of some 
sort, but it sounds suspiciously like that because blocks should get written in 
the correct place. Right? But they didn't.


 Sep 19 10:16:18 localhost.localdomain kernel: BTRFS: Failed to read block 
 groups: -5

This came up in a recent thread Problem with a filesystem. I'm not sure what 
it means.

Once you've taken the btrfs-image, and you're about ready to toss the file 
system it's worth trying these commands.

btrfs rescue super-recover -v
## if it fixes anything, don't continue, try to mount the fs
btrfs check --repair
## I'd try mounting even if it doesn't say it's repaired anything
btrfs check --repair --init-extent-tree
## Again try to mount the fs

And report kernel and user space messages.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/2] Return a value from printk_ratelimited

2014-09-19 Thread Steven Rostedt

On Fri, 19 Sep 2014 02:01:29 -0700
Omar Sandoval osan...@osandov.com wrote:

 printk returns an integer; there's no reason for printk_ratelimited to swallow
 it.
 
 Signed-off-by: Omar Sandoval osan...@osandov.com
 ---
  include/linux/printk.h | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/include/linux/printk.h b/include/linux/printk.h
 index d78125f..67534bc 100644
 --- a/include/linux/printk.h
 +++ b/include/linux/printk.h
 @@ -343,12 +343,14 @@ extern asmlinkage void dump_stack(void) __cold;
  #ifdef CONFIG_PRINTK
  #define printk_ratelimited(fmt, ...) \
  ({   \
 + int __ret = 0;  \

My only issues is with the __ret name.

It's not really unique enough. If something else uses __ret and does

  printk_ratelimit(some fmt string %d\n, __ret);

This will not print the right value.

printk_ratelimit can be used almost anywhere thus using a really unique
value may be worth while here.

What about:

  int __r

?

-- Steve

   static DEFINE_RATELIMIT_STATE(_rs,  \
 DEFAULT_RATELIMIT_INTERVAL,   \
 DEFAULT_RATELIMIT_BURST); \
   \
   if (__ratelimit(_rs))  \
 - printk(fmt, ##__VA_ARGS__); \
 + __ret = printk(fmt, ##__VA_ARGS__); \
 + __ret;  \
  })
  #else
  #define printk_ratelimited(fmt, ...) \

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with unmountable filesystem.

2014-09-19 Thread Austin S Hemmelgarn


On 2014-09-19 13:07, Chris Murphy wrote:

Possibly btrfs-select-super can do some of the things I was doing the hard way. It's 
possible to select a super to overwrite other supers, even if they're good 
ones. Whereas btrfs rescue super-recover won't do that, and neither will btrfsck, hence 
why I corrupted the one I didn't want first. This command isn't built by default (at 
least not on Fedora).
I don't think it's built by default on any of the major distributions. 
On Gentoo you need to set package specific configure options.





smime.p7s
Description: S/MIME Cryptographic Signature

Re: Performance Issues

2014-09-19 Thread Josef Bacik


On 09/19/2014 11:51 AM, Rob Spanton wrote:

Hi,

Thanks for the response everyone.

I wrote:

I have a particularly uncomplicated setup (a desktop PC with a hard
disk) and I'm seeing particularly slow performance from btrfs.  A `git
status` in the linux source tree takes about 46 seconds after dropping
caches, whereas on other machines using ext4 this takes about 13s.  My
mail client (evolution) also seems to perform particularly poorly on
this setup, and my hunch is that it's spending a lot of time waiting on
the filesystem.


The evolution problem has been improved: the sqlite db that it was using
had over 18000 fragments, so I got evolution to recreate that file with
nocow set.  It now takes only 30s to load my mail rather than 80s,
which is better...

On Fri, 2014-09-19 at 11:05 -0400, Josef Bacik wrote:

Weird, I get the exact opposite performance.  Anyway it's probably
because of your file layouts, try defragging your git dir and see if
that helps.  Thanks,


Defragging has improved matters a bit: it now takes 26s (was 46s) to run
git status.  Still not amazing, but at the moment I have no evidence to
suggest that it's not something to do with the machine's hardware.  If I
get time over the weekend I'll dig out an external hard disk and try a
couple of benchmarks with that.

For reference, these are the mount flags:
 /dev/sda4 on / type btrfs (rw,noatime,space_cache)
 /dev/sda4 on /home type btrfs (rw,noatime,space_cache)



You have an awful lot of metadata, do you have a lot of snapshots?  Also 
I'd be interested in making sure most of this is just from shitty 
metadata layout, could you make sure you have a recent version of 
trace-cmd and then drop caches and do


trace-cmd record -e sched:sched_switch git status

and send me the trace.dat so I can see where all the time is spent?  Thanks,

Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Zach Brown

On Fri, Sep 19, 2014 at 01:51:22PM +, Holger Hoffstätte wrote:
 
 On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:
 
  I have a particularly uncomplicated setup (a desktop PC with a hard
  disk) and I'm seeing particularly slow performance from btrfs.  A `git
  status` in the linux source tree takes about 46 seconds after dropping
  caches, whereas on other machines using ext4 this takes about 13s.  My
  mail client (evolution) also seems to perform particularly poorly on
  this setup, and my hunch is that it's spending a lot of time waiting on
  the filesystem.
 
 This is - unfortunately - a particular btrfs oddity/characteristic/flaw,
 whatever you want to call it. git relies a lot on fast stat() calls,
 and those seem to be particularly slow with btrfs esp. on rotational
 media. I have the same problem with rsync on a freshly mounted volume;
 it gets fast (quite so!) after the first run.
 
 The simplest thing to fix this is a du -s /dev/null to pre-cache all
 file inodes.
 
 I'd also love a technical explanation why this happens and how it could
 be fixed. Maybe it's just a consequence of how the metadata tree(s)
 are laid out on disk.

There's a lot of meat behind that just a consequence but, yes, that's
the heart of it.  Different metadata designs result in different io
patterns which single rotating drives are exquisitely sensitive to.

You can look for differences in io patterns with iostat, blktrace,
iowatcher, etc.  They'll show differences in io sizes, concurrency,
locality, and often differences in the amount of blocks of data read.

  http://masoncoding.com/iowatcher/

As for fixing it, wel, it's arguably working as intended.  If you
turned btrfs from one cow tree into lots of journaled trees of trees
then, well, we'd be left with an absurd reimplementation of ext*|xfs.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with unmountable filesystem.

2014-09-19 Thread Chris Murphy


On Sep 17, 2014, at 5:23 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote:

[   30.920536] BTRFS: bad tree block start 0 130402254848
[   30.924018] BTRFS: bad tree block start 0 130402254848
[   30.926234] BTRFS: failed to read log tree
[   30.953055] BTRFS: open_ctree failed

I'm still confused. Btrfs knows this tree root is bad, but it has backup roots. 
So why wasn't one of those used by -o recovery? I thought that's the whole 
point of that mount option. Backup tree roots are per superblock, so 
conceivably you'd have up to 8 of these with two superblocks, they're shown with
btrfs-show-super -af  ## and -F even if a super is bad

But skipping that, to fix this you need to know which super is pointing to the 
wrong tree root, since you're using ssd mount option with rotating supers. I 
assume mount uses the super with the highest generation number. So you'd need 
to:
btrfs-show-super -a
to find out the super with the most recent generation. You'd assume that one 
was wrong. And then use btrfs-select-super to pick the right one, and replace 
the wrong one. Then you could mount.

I also wonder if btrfs check -sX would show different results in your case. I'd 
think it would because it ought to know one of those tree roots is bad, seeing 
as mount knows it. And then it seems (I'm speculating a ton) that --repair 
might try to fix the bad tree root, and then if it fails I'd like to think it 
can just find the most recent good tree root, ideally one listed as a 
backup_tree_root by any good superblock, and then have the next mount use that.

I'm not sure why this persistently fails, and I wonder if there are cases of 
users giving up and blowing away file systems that could actually be mountable. 
But it's just really a manual process figuring out what things to do in what 
order to get them to mount. 


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/2] Return a value from printk_ratelimited

2014-09-19 Thread Joe Perches

On Fri, 2014-09-19 at 13:21 -0400, Steven Rostedt wrote:
 On Fri, 19 Sep 2014 02:01:29 -0700
 Omar Sandoval osan...@osandov.com wrote:
 
  printk returns an integer; there's no reason for printk_ratelimited to 
  swallow
  it.

Except for the lack of usefulness of the return value itself.
See: https://lkml.org/lkml/2009/10/7/275


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with unmountable filesystem.

2014-09-19 Thread Austin S Hemmelgarn


On 2014-09-19 13:54, Chris Murphy wrote:


On Sep 17, 2014, at 5:23 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote:

[   30.920536] BTRFS: bad tree block start 0 130402254848
[   30.924018] BTRFS: bad tree block start 0 130402254848
[   30.926234] BTRFS: failed to read log tree
[   30.953055] BTRFS: open_ctree failed

I'm still confused. Btrfs knows this tree root is bad, but it has backup roots. 
So why wasn't one of those used by -o recovery? I thought that's the whole 
point of that mount option. Backup tree roots are per superblock, so 
conceivably you'd have up to 8 of these with two superblocks, they're shown with
btrfs-show-super -af  ## and -F even if a super is bad

But skipping that, to fix this you need to know which super is pointing to the 
wrong tree root, since you're using ssd mount option with rotating supers. I 
assume mount uses the super with the highest generation number. So you'd need 
to:
btrfs-show-super -a
to find out the super with the most recent generation. You'd assume that one 
was wrong. And then use btrfs-select-super to pick the right one, and replace 
the wrong one. Then you could mount.

I also wonder if btrfs check -sX would show different results in your case. I'd 
think it would because it ought to know one of those tree roots is bad, seeing 
as mount knows it. And then it seems (I'm speculating a ton) that --repair 
might try to fix the bad tree root, and then if it fails I'd like to think it 
can just find the most recent good tree root, ideally one listed as a 
backup_tree_root by any good superblock, and then have the next mount use that.

I'm not sure why this persistently fails, and I wonder if there are cases of 
users giving up and blowing away file systems that could actually be mountable. 
But it's just really a manual process figuring out what things to do in what 
order to get them to mount.

From what I can tell, btrfs check doesn't do anything about backup 
superblocks unless you specifically tell it to.  In this case, running 
btrfs check without specifying a superblock mirror, and with explicitly 
specifying the primary superblock produced identical results (namely it 
choked, hard, with an error message similar to that from the kernel. 
However, running it with -s1 to select the first backup superblock 
returned no errors at all other than the space_cache being invalid and 
the count of used blocks being wrong.


Based on my (limited) understanding of the mount code, it does try to 
use the superblock with the highest generation (regardless of whether we 
are on an ssd or not), but doesn't properly fall back to a secondary 
superblock after trying to mount using the primary.


As far as btrfs check repair trying to fix this, I don't think that it 
does so currently, probably for the same reason that mount fails.





smime.p7s
Description: S/MIME Cryptographic Signature

Re: Single disk parrallelization

2014-09-19 Thread Austin S Hemmelgarn


On 2014-09-19 14:10, Jeb Thomson wrote:

With the advanced features of btrfs, it would be an additional simple task to 
make different platters run in parallel.

In this case, say a disk has three platters, and so three seek heads as well. 
If we can identify that much, and what offsets they are at, it then becomes a 
trivial matter to place the reads and writes to different platters at the same 
time.

In affect, this means each platter should be operating as a single virtualized 
unit, instead of one single unit...


In theory this is a great idea except for two things:
1) Most consumer drives have only one platter.
2) The kernel doesn't have such low-level hardware access, so it would 
have to be implemented in device firmware (and I'd be willing to bet 
that most drive manufacturers already stripe data across multiple 
platters when possible).





smime.p7s
Description: S/MIME Cryptographic Signature

[PATCH] Btrfs: fix build_backref_tree issue with multiple shared blocks

2014-09-19 Thread Josef Bacik

Marc Merlin sent me a broken fs image months ago where it would blow up in the
upper-checked BUG_ON() in build_backref_tree.  This is because we had a
scenario like this

block a -- level 4 (not shared)
   |
block b -- level 3 (reloc block, shared)
   |
block c -- level 2 (not shared)
   |
block d -- level 1 (shared)
   |
block e -- level 0 (shared)

We go to build a backref tree for block e, we notice block d is shared and add
it to the list of blocks to lookup it's backrefs for.  Now when we loop around
we will check edges for the block, so we will see we looked up block c last
time.  So we lookup block d and then see that the block that points to it is
block c and we can just skip that edge since we've already been up this path.
The problem is because we clear need_check when we see block d (as it is shared)
we never add block b as needing to be checked.  And because block c is in our
path already we bail out before we walk up to block b and add it to the backref
check list.

To fix this we need to reset need_check if we trip over a block that doesn't
need to be checked.  This will make sure that any subsequent blocks in the path
as we're walking up afterwards are added to the list to be processed.  With this
patch I can now mount Marc's fs image and it'll complete the balance without
panicing.  Thanks,

Reported-by: Marc MERLIN m...@merlins.org
Signed-off-by: Josef Bacik jba...@fb.com
---
 fs/btrfs/relocation.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 19726af..b55ea37 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -978,8 +978,11 @@ again:
need_check = false;
list_add_tail(edge-list[UPPER],
  list);
-   } else
+   } else {
+   if (upper-checked)
+   need_check = true;
INIT_LIST_HEAD(edge-list[UPPER]);
+   }
} else {
upper = rb_entry(rb_node, struct backref_node,
 rb_node);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Btrfs fixes

2014-09-19 Thread Chris Mason

Hi Linus,

We have two more fixes for pulling:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

I've got a  revert to fix a regression with btrfs device registration,
and Filipe has part two of his fsync fix from last week.

Chris Mason (1) commits (+6/-7):
Revert Btrfs: device_list_add() should not update list when mounted

Filipe Manana (1) commits (+13/-14):
Btrfs: set inode's logged_trans/last_log_commit after ranged fsync

Total: (2) commits (+19/-21)

 fs/btrfs/btrfs_inode.h | 13 +++--
 fs/btrfs/tree-log.c| 14 ++
 fs/btrfs/volumes.c | 13 ++---
 3 files changed, 19 insertions(+), 21 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-19 Thread William Hanson

Hey guys...

I was just crawling through the wiki and this list's archive to find
answers about some questions.
Actually many of them matching those which Christoph has asked here
some time ago, though it seems no answers came up at all.

Isn't it possible to answer them, at least one by one? I'd believe that
most of these questions and their answers would be of common interest
and having them properly answered should be a benefit for all possible
btrfs users.

Regards,
William.


On Sun, 2014-08-31 at 06:02 +0200, Christoph Anton Mitterer wrote:
 Hey.


 For some time now I consider to use btrfs at a larger scale, basically
 in two scenarios:

 a) As the backend for data pools handled by dcache (dcache.org), where
 we run a Tier-2 in the higher PiB range for the LHC Computing Grid...
 For now that would be rather boring use of btrfs (i.e. not really
 using any of its advanced features) and also RAID functionality would
 still be provided by hardware (at least with the current hardware
 generations we have in use).

 b) Personally, for my NAS. Here the main goal is less performance but
 rather data safety (i.e. I want something like RAID6 or better) and
 security (i.e. it will be on top of dm-crypt/LUKS) and integrity.
 Hardware wise I'll use and UPS as well as enterprise SATA disks, from
 different vendors respectively different production lots.
 (Of course I'm aware that btrfs is experimental, and I would have
 regular backups)




 1) Now I've followed linux-btrfs for a while and blogs like Marc's...
 and I still read about a lot of stability problems, some which sound
 quite serious.
 Sure we have a fsck now, but even in the wiki one can read statements
 like the developers use it on their systems without major
problems...
 but also if you do this, it could help you... or break even more.

 I mean I understand that there won't be a single point in time, where
 Chris Mason says now it's stable and it would be rock solid form
that
 point on... but especially since new features (e.g. things like
 subvolume quota groups, online/offline dedup, online/offline fsck)
move
 (or will) move in with every new version... one has (as an end-user)
 basically no chance to determine what can be used safely and what
 tickles the devil.

 So one issue I have is to determine the general stability of the
 different parts.




 2) Documentation status...
 I feel that some general and extensive documentation is missing. One
 that basically handles (and teaches) all the things which are specific
 to modern (especially CoW) filesystems.
 - General design, features and problems of CoW and btrfs
 - Special situations that arise from the CoW, e.g. that one may not be
 able to remove files once the fs is full,... or that just reading
files
 could make the used space grow (via the atime)
 - General guidelines when and how to use nodatacow... i.e. telling
 people for which kinds of files this SHOULD usually be done (VM
 images)... and what this means for those files (not checksumming) and
 what the drawbacks are if it's not used (e.g. if people insist on
having
 the checksumming - what happens to the performance of VM images? what
 about the wear with SSDs?)
 - the implications of things like compression and hash algos...
whether
 and when this will have performance impacts (positive or negative) and
 when not.
 - the typical lifecycles and procedures when using stuff like multiple
 devices (how to replace a faulty disk) or important hints like (don't
 span a btrfs RAID over multiple partitions on the same disk)
 - especially with the different (mount)options, I mean things that
 change the way the fs works like no-hole or mixed data/meta block
 groups... people need to have some general information when to choose
 which and some real world examples of disadvantages / advantages. E.g.
 what are the disadvantages of having mixed data/meta block groups? If
 there'd be only advantages, why wouldn't it be the default?

 Parts of this is already scattered over LWN articles, the wiki
(however
 the quality greatly varies there), blog posts or mailing list
posts...
 many of the information there is however outdated... and suggested
 procedures (e.g. how to replace a faulty disk) differ from example to
 example.
 An admin that wants to use btrfs shouldn't be required to pick all
this
 together (which is basically impossible).. there should be a manpage
 (which is kept up to date!) that describes all this.

 Other important things to document (which I couldn't fine so far in
most
 cases): What is actually guaranteed by btrfs respectively its design?
 For example:
 - If there'd be no bugs in the code,.. would the fs be guaranteed to
be
 always consistent by it's CoW design? Or are there circumstances where
 it can still run into being inconsistent?
 - Does this basically mean, that even without and fs journal,.. my
 database is always consistent even if I have a power cut or system
 crash?
 - At which places does checksumming take place? Just

Re: Single disk parrallelization

2014-09-19 Thread Ralf-Peter Rohbeck


On 09/19/2014 11:10 AM, Jeb Thomson wrote:

With the advanced features of btrfs, it would be an additional simple task to 
make different platters run in parallel.

In this case, say a disk has three platters, and so three seek heads as well. 
If we can identify that much, and what offsets they are at, it then becomes a 
trivial matter to place the reads and writes to different platters at the same 
time.

In affect, this means each platter should be operating as a single virtualized 
unit, instead of one single unit...

Regards,
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

A disk drive has only one actuator that moves all heads in parallel. 
Also disk drives are an array of logical blocks today; nobody uses 
cylinder/head/sector addressing any more.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Help Out with the Btrfs Code base and User Space Tools

2014-09-19 Thread nick

Hey Fellow Developers,
I am new to working on the Linux Kernel and am interested in helping out with 
btrfs file system and it's respective 
user space tools. If anyone either has some work or would like to mentor me 
with the code base that would be greatly
appreciated. In addition I hope to do this professionally eventually in the 
future as a actual career.
Cheers and Thanks Nick   
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Performance Issues

2014-09-19 Thread Duncan

Rob Spanton posted on Fri, 19 Sep 2014 17:51:09 +0100 as excerpted:

 The evolution problem has been improved: the sqlite db that it was using
 had over 18000 fragments, so I got evolution to recreate that file with
 nocow set.  It now takes only 30s to load my mail rather than 80s,
 which is better...
 
 On Fri, 2014-09-19 at 11:05 -0400, Josef Bacik wrote:
 Weird, I get the exact opposite performance.  Anyway it's probably
 because of your file layouts, try defragging your git dir and see if
 that helps.  Thanks,
 
 Defragging has improved matters a bit: it now takes 26s (was 46s) to run
 git status.  Still not amazing, but at the moment I have no evidence to
 suggest that it's not something to do with the machine's hardware.  If I
 get time over the weekend I'll dig out an external hard disk and try a
 couple of benchmarks with that.

[Replying via mail and list both, as requested.]

If you're snapshotting those nocow files, be aware (if you aren't 
already) that nocow, snapshots and defrag (all on the same files) don't 
work all that well together...

First let's deal with snapshots of nocow files.

What does a snapshot do?  It locks in place the existing version of a 
file, both logically, so you can get at that version of it via the 
snapshot even after changes have been made, and physically, it locks 
existing extents where they are.  With normal cow files this is fine, 
since any changes would cause the changed block to be written elsewhere, 
freeing the now replaced block if there's nothing holding it in place.  A 
snapshot simply keeps a reference to the existing extent when the data is 
cowed elsewhere instead of releasing it, so there's a way to get the old 
version as referenced by that snapshot back too.

But nocow files are normally overwritten in place, that's what nocow 
/is/.  Obviously that conflicts with what a snapshot does, locking the 
existing version in place.

What btrfs does, then, to handle that, is on the first write to a (4KB) 
block in a (normally) nowcow file after a snapshot, a cow write is forced 
on that block anyway.  The file remains nocow, and additional writes to 
the /same/ block continue to write to the same new location... until 
another snapshot locks /that/ in place.

All fine if you're just doing occasional snapshots and/or if the nocow 
file isn't being very actively rewritten after all; it's not that big a 
deal in that case.  *BUT*, if you're doing time-based snapshots say every 
hour or so, and the file is actively being semi-randomly rewritten, the 
constant snapshotting locking in place the current version, forcing many 
of those writes to cow anyway, is going to end up fragmenting that file 
nearly as fast as it would without the nocow.  IOW, the nocow ends up 
being nearly worthless on that file!

There is a (partial) workaround, however.  You can use the fact that 
snapshots stop at subvolume boundaries, putting the nocow files on their 
own dedicated subvolume.  You can then continue snapshotting the up-tree 
subvolume as you were before and it'll stop at the dedicated subvolume, 
so the nocow files on that subvolume don't get snapshotted and thus don't 
get fragmented anyway.

Of course without that snapshotting you'll need to do conventional backup 
on the files in that dedicated nocow subvolume.

Another alternative is to continue snapshotting the dedicated subvolume 
and its nocow files, but at a lower frequency, perhaps every day or twice 
a day instead of every hour, or maybe twice a week instead of daily, or 
whatever.  That will slow down but not eliminate the snapshot-triggered 
fragmentation of the nocow files.

If you then combine that with scheduled (presumably cron job or systemd-
timer) defrag of that dedicated subvolume, perhaps weekly or monthly, 
depending on how fast it still fragments, that can help keep performance 
from dragging down too badly.

Of course you can use the scheduled defrag technique without the 
dedicated subvolume and just up the frequency of the defrags instead of 
decreasing the frequency of the snapshotting, too, if it works better for 
you.


Meanwhile, how big are those files?  If you're not dealing with any nocow-
candidate files approaching a gig or larger, you may find that the 
autodefrag mount option helps.  However, it works by queuing up a rewrite 
of the entire file for a worker thread that comes along a bit later, and 
if the file is too big and being written to too much, the changes to the 
file can end up coming faster than the file can be rewritten.  Obviously 
that's not a good thing.  Generally, for files under 100 MB autodefrag 
works very well.  For actively rewritten files over a GB, it doesn't work 
well at all, and for files between 100 MB and 1 GB, it depends on the 
speed of your hardware and how fast the rewrites are coming in.  
Actually, most folks seem to be OK up to a quarter GiB or so, and most 
folks have problems starting around 3/4 GiB or so.  256-768 MiB is the 
YMMV zone.

47 matches

Mail list logo