Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-24 Thread Marc MERLIN
On Sun, Jul 22, 2012 at 11:42:03PM -0700, Marc MERLIN wrote:
 I just realized that the older thread got a bit confusing, so I'll keep
 problems separate and make things simpler :)
 
Since yesterday, I tried other kernels, including noprempt, volprempt and
preempt for 3.4.4.
I also tried a default 3.2.0 kernel from debian (all amd64), but that did
not help. I'm still seeing close to 25 seconds to scan 15K files.

How can it possibly be so slow?
More importantly how I can provide useful debug information.

- I don't think it's a problem with the kernel since I tried 4 kernels,
  including a default debian one.

- Alignement seem ok, I made sure cylinders was divisible by 512:
/dev/sda2  5022725293055926214144   83  Linux

- I tried another brand new btrfs, and thing are even slower now.
gandalfthegreat:/mnt/mnt2# mount -o ssd,discard,noatime /dev/sda2 /mnt/mnt2
gandalfthegreat:/mnt/mnt2# reset_cache 
gandalfthegreat:/mnt/mnt2# time du -sh src/
514Msrc/
real0m29.584s
gandalfthegreat:/mnt/mnt2# find src/| wc -l
15261

This is bad enough that there ought to be a way to debug this, right?

Can you suggest something?

Thanks,
Marc

 On an _unencrypted_ partition on the SSD, running du -sh on a directory
 with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
 encrypted spinning drive, both with a similar btrfs filesystem, and 
 the same kernel (3.4.4).
 
 Unencrypted btrfs on SSD:
 gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime 
 /dev/sda2 /mnt/mnt2
 gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du -sh src
 514M  src
 real  0m22.667s
 
 Encrypted btrfs on spinning drive of the same src directory:
 gandalfthegreat:/var/local# echo 3  /proc/sys/vm/drop_caches; time du -sh src
 514M  src
 real  0m3.881s
 
 I've run this many times and get the same numbers.
 I've tried deadline and noop on /dev/sda (the SSD) and du is just as slow.  
 
 I also tried with:
 - space_cache and nospace_cache
 - ssd and nossd
 - noatime didn't seem to help even though I was hopeful on this one.
 
 In all cases, I get:
 gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du -sh src
 514M  src
 real  0m22.537s
 
 
 I'm having the same slow speed on 2 btrfs filesystems on the same SSD.
 One is encrypted, the other one isnt:
 Label: 'btrfs_pool1'  uuid: d570c40a-4a0b-4d03-b1c9-cff319fc224d
   Total devices 1 FS bytes used 144.74GB
   devid1 size 441.70GB used 195.04GB path /dev/dm-0
 
 Label: 'boot'  uuid: 84199644-3542-430a-8f18-a5aa58959662
   Total devices 1 FS bytes used 2.33GB
   devid1 size 25.00GB used 5.04GB path /dev/sda2
 
 If instead of stating a bunch of files, I try reading a big file, I do get 
 speeds
 that are quite fast (253MB/s and 423MB/s).
 
 22 seconds for 15K files on an SSD is super slow and being 5 times
 slower than a spinning disk with the same data.
 What's going on?
 
 Thanks,
 Marc

-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-24 Thread Martin Steigerwald
Am Montag, 23. Juli 2012 schrieb Marc MERLIN:
 I just realized that the older thread got a bit confusing, so I'll keep
 problems separate and make things simpler :)
 
 On an _unencrypted_ partition on the SSD, running du -sh on a directory
 with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
 encrypted spinning drive, both with a similar btrfs filesystem, and
 the same kernel (3.4.4).
 
 Unencrypted btrfs on SSD:
 gandalfthegreat:~# mount -o
 compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2
 gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du
 -sh src 514M  src
 real  0m22.667s
 
 Encrypted btrfs on spinning drive of the same src directory:
 gandalfthegreat:/var/local# echo 3  /proc/sys/vm/drop_caches; time du
 -sh src 514M  src
 real  0m3.881s

find is fast, du is much slower:

merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( find /usr | wc -l )
404166
( find /usr | wc -l; )  0,03s user 0,07s system 1% cpu 9,212 total
merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( du -sh /usr )  
11G /usr
( du -sh /usr; )  1,00s user 19,07s system 41% cpu 48,886 total


Now I try to find something with less files.

merkaba:~ find /usr/share/doc | wc -l   
50715
merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( find /usr/share/doc 
| wc -l )
50715
( find /usr/share/doc | wc -l; )  0,00s user 0,02s system 1% cpu 1,398 
total
merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( du -sh 
/usr/share/doc )  
606M/usr/share/doc
( du -sh /usr/share/doc; )  0,20s user 3,63s system 35% cpu 10,691 total

merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time du -sh /usr/share/doc   
606M/usr/share/doc
du -sh /usr/share/doc  0,19s user 3,54s system 35% cpu 10,386 total


Anyway thats still much faster than your measurements.



merkaba:~ df -hT /usr
DateisystemTyp   Größe Benutzt Verf. Verw% Eingehängt auf
/dev/dm-0  btrfs   19G 11G  5,6G   67% /
merkaba:~ btrfs fi sh  
 
failed to read /dev/sr0
Label: 'debian'  uuid: […]
Total devices 1 FS bytes used 10.25GB
devid1 size 18.62GB used 18.62GB path /dev/dm-0

Btrfs Btrfs v0.19
merkaba:~ btrfs fi df /   
Data: total=15.10GB, used=9.59GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=670.43MB
Metadata: total=8.00MB, used=0.00


merkaba:~ grep btrfs /proc/mounts
/dev/dm-0 / btrfs rw,noatime,compress=lzo,ssd,space_cache,inode_cache 0 0


Somewhat aged BTRFS filesystem on ThinkPad T520, Intel SSD 320, kernel 
3.5.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs-progs: search subvolumes with proper objectid

2012-07-24 Thread Liu Bo
Btrfs's subvolume/snapshot is limited to
[BTRFS_FIRST_FREE_OBJECTID, BTRFS_LAST_FREE_OBJECTID], so just apply the range.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 btrfs-list.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index c53d016..ac6507a 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -634,11 +634,13 @@ static int __list_subvol_search(int fd, struct 
root_lookup *root_lookup)
sk-max_type = BTRFS_ROOT_BACKREF_KEY;
sk-min_type = BTRFS_ROOT_BACKREF_KEY;
 
+   sk-min_objectid = BTRFS_FIRST_FREE_OBJECTID;
+
/*
 * set all the other params to the max, we'll take any objectid
 * and any trans
 */
-   sk-max_objectid = (u64)-1;
+   sk-max_objectid = BTRFS_LAST_FREE_OBJECTID;
sk-max_offset = (u64)-1;
sk-max_transid = (u64)-1;
 
@@ -690,7 +692,7 @@ static int __list_subvol_search(int fd, struct root_lookup 
*root_lookup)
if (sk-min_type  BTRFS_ROOT_BACKREF_KEY) {
sk-min_type = BTRFS_ROOT_BACKREF_KEY;
sk-min_offset = 0;
-   } else  if (sk-min_objectid  (u64)-1) {
+   } else  if (sk-min_objectid  BTRFS_LAST_FREE_OBJECTID) {
sk-min_objectid++;
sk-min_type = BTRFS_ROOT_BACKREF_KEY;
sk-min_offset = 0;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs-progs: show generation in command btrfs subvol list

2012-07-24 Thread Liu Bo
This adds the ability to show root's generation when we use btrfs subvol list.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 btrfs-list.c |   61 -
 1 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index ac6507a..05360dc 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -57,6 +57,9 @@ struct root_info {
/* the dir id we're in from ref_tree */
u64 dir_id;
 
+   /* generation when the root is created or last updated */
+   u64 gen;
+
/* path from the subvol we live in to this root, including the
 * root's name.  This is null until we do the extra lookup ioctl.
 */
@@ -194,6 +197,19 @@ static int add_root(struct root_lookup *root_lookup,
return 0;
 }
 
+static int update_root(struct root_lookup *root_lookup, u64 root_id, u64 gen)
+{
+   struct root_info *ri;
+
+   ri = tree_search(root_lookup-root, root_id);
+   if (!ri || ri-root_id != root_id) {
+   fprintf(stderr, could not find subvol %llu\n, root_id);
+   return -ENOENT;
+   }
+   ri-gen = gen;
+   return 0;
+}
+
 /*
  * for a given root_info, search through the root_lookup tree to construct
  * the full path name to it.
@@ -615,11 +631,15 @@ static int __list_subvol_search(int fd, struct 
root_lookup *root_lookup)
struct btrfs_ioctl_search_key *sk = args.key;
struct btrfs_ioctl_search_header *sh;
struct btrfs_root_ref *ref;
+   struct btrfs_root_item *ri;
unsigned long off = 0;
int name_len;
char *name;
u64 dir_id;
+   u8 type;
+   u64 gen = 0;
int i;
+   int get_gen = 0;
 
root_lookup_init(root_lookup);
memset(args, 0, sizeof(args));
@@ -644,6 +664,7 @@ static int __list_subvol_search(int fd, struct root_lookup 
*root_lookup)
sk-max_offset = (u64)-1;
sk-max_transid = (u64)-1;
 
+again:
/* just a big number, doesn't matter much */
sk-nr_items = 4096;
 
@@ -665,7 +686,7 @@ static int __list_subvol_search(int fd, struct root_lookup 
*root_lookup)
sh = (struct btrfs_ioctl_search_header *)(args.buf +
  off);
off += sizeof(*sh);
-   if (sh-type == BTRFS_ROOT_BACKREF_KEY) {
+   if (!get_gen  sh-type == BTRFS_ROOT_BACKREF_KEY) {
ref = (struct btrfs_root_ref *)(args.buf + off);
name_len = btrfs_stack_root_ref_name_len(ref);
name = (char *)(ref + 1);
@@ -673,6 +694,11 @@ static int __list_subvol_search(int fd, struct root_lookup 
*root_lookup)
 
add_root(root_lookup, sh-objectid, sh-offset,
 dir_id, name, name_len);
+   } else if (get_gen  sh-type == BTRFS_ROOT_ITEM_KEY) {
+   ri = (struct btrfs_root_item *)(args.buf + off);
+   gen = btrfs_root_generation(ri);
+
+   update_root(root_lookup, sh-objectid, gen);
}
 
off += sh-len;
@@ -689,17 +715,38 @@ static int __list_subvol_search(int fd, struct 
root_lookup *root_lookup)
/* this iteration is done, step forward one root for the next
 * ioctl
 */
-   if (sk-min_type  BTRFS_ROOT_BACKREF_KEY) {
-   sk-min_type = BTRFS_ROOT_BACKREF_KEY;
+   if (get_gen)
+   type = BTRFS_ROOT_ITEM_KEY;
+   else
+   type = BTRFS_ROOT_BACKREF_KEY;
+
+   if (sk-min_type  type) {
+   sk-min_type = type;
sk-min_offset = 0;
} else  if (sk-min_objectid  BTRFS_LAST_FREE_OBJECTID) {
sk-min_objectid++;
-   sk-min_type = BTRFS_ROOT_BACKREF_KEY;
+   sk-min_type = type;
sk-min_offset = 0;
} else
break;
}
 
+   if (!get_gen) {
+   memset(args, 0, sizeof(args));
+
+   sk-tree_id = 1;
+   sk-max_type = BTRFS_ROOT_ITEM_KEY;
+   sk-min_type = BTRFS_ROOT_ITEM_KEY;
+
+   sk-min_objectid = BTRFS_FIRST_FREE_OBJECTID;
+
+   sk-max_objectid = BTRFS_LAST_FREE_OBJECTID;
+   sk-max_offset = (u64)-1;
+   sk-max_transid = (u64)-1;
+
+   get_gen = 1;
+   goto again;
+   }
return 0;
 }
 
@@ -781,13 +828,15 @@ int list_subvols(int fd, int print_parent, int 
get_default)
 
resolve_root(root_lookup, entry, parent_id, level, path);
if (print_parent) {
- 

Re: [PATCH v3 1/1] Btrfs: Check INCOMPAT flags on remount and add helper function

2012-07-24 Thread David Sterba
We don't need a helper for every incompatibility bit, let's do it in a
more generic way as suggested below [modulo syntax errors]:

On Fri, Jul 20, 2012 at 05:16:41PM -0500, Mitch Harder wrote:
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -3103,6 +3103,19 @@ void __btrfs_abort_transaction(struct 
 btrfs_trans_handle *trans,
  struct btrfs_root *root, const char *function,
  unsigned int line, int errno);
  
 +static inline void btrfs_chk_lzo_incompat(struct btrfs_root *root)
 +{

btrfs_set_fs_incompat(struct btrfs_fs_info *fs_info, u64 flag) {

 + struct btrfs_super_block *disk_super;
 + u64 features;
 +
 + disk_super = root-fs_info-super_copy;

disk_super = fs_info-super_copy;

 + features = btrfs_super_incompat_flags(disk_super);
 + if (!(features  BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) {
 + features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;

if (!(features  flag)) {
features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;

 + btrfs_set_super_incompat_flags(disk_super, features);
 + }
 +}
 +
  #define btrfs_abort_transaction(trans, root, errno)  \
  do { \
   __btrfs_abort_transaction(trans, root, __func__,\
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 17facea..d5fd69e 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file 
 *file,
 u64 newer_than, unsigned long max_to_defrag)
  {
   struct btrfs_root *root = BTRFS_I(inode)-root;
 - struct btrfs_super_block *disk_super;
   struct file_ra_state *ra = NULL;
   unsigned long last_index;
   u64 isize = i_size_read(inode);
 - u64 features;
   u64 last_len = 0;
   u64 skip = 0;
   u64 defrag_end = 0;
 @@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
 *file,
   mutex_unlock(inode-i_mutex);
   }
  
 - disk_super = root-fs_info-super_copy;
 - features = btrfs_super_incompat_flags(disk_super);
   if (range-compress_type == BTRFS_COMPRESS_LZO) {
 - features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
 - btrfs_set_super_incompat_flags(disk_super, features);
 + btrfs_chk_lzo_incompat(root);

btrfs_set_fs_incompat(fs_info, 
BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO);

   }
  
   ret = defrag_count;
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 26da344..32c2bd9 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
   compress_type = lzo;
   info-compress_type = BTRFS_COMPRESS_LZO;
   btrfs_set_opt(info-mount_opt, COMPRESS);
 + btrfs_chk_lzo_incompat(root);

btrfs_set_fs_incompat(fs_info, 
BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO);

   } else if (strncmp(args[0].from, no, 2) == 0) {
   compress_type = no;
   info-compress_type = BTRFS_COMPRESS_NONE;
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Btrfs: Check INCOMPAT flags on remount and add helper function

2012-07-24 Thread Mitch Harder
In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
v1-v2
- Remove extraneous formatting change.
v2-v3
- Consolidate into a single patch
- Convert helper function to a static inline function.
v3-v4
- Per feedback from Li Zefan, change function name from _chk_ to _set_
- Per feedback from David Sterba, make the helper function more generic.
- The more generic function can also be implemented in the INCOMPAT
  check made for setting the default subvolume.

 fs/btrfs/ctree.h |   17 +
 fs/btrfs/ioctl.c |   16 ++--
 fs/btrfs/super.c |1 +
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a0ee2f8..5422e54 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3103,6 +3103,23 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, const char *function,
   unsigned int line, int errno);
 
+#define btrfs_set_fs_incompat(__fs_info, opt) \
+   __btrfs_set_fs_incompat((__fs_info), BTRFS_FEATURE_INCOMPAT_##opt)
+
+static inline void __btrfs_set_fs_incompat(struct btrfs_fs_info *fs_info,
+  u64 flag)
+{
+   struct btrfs_super_block *disk_super;
+   u64 features;
+
+   disk_super = fs_info-super_copy;
+   features = btrfs_super_incompat_flags(disk_super);
+   if (!(features  flag)) {
+   features |= flag;
+   btrfs_set_super_incompat_flags(disk_super, features);
+   }
+}
+
 #define btrfs_abort_transaction(trans, root, errno)\
 do {   \
__btrfs_abort_transaction(trans, root, __func__,\
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 17facea..0d5d079 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
  u64 newer_than, unsigned long max_to_defrag)
 {
struct btrfs_root *root = BTRFS_I(inode)-root;
-   struct btrfs_super_block *disk_super;
struct file_ra_state *ra = NULL;
unsigned long last_index;
u64 isize = i_size_read(inode);
-   u64 features;
u64 last_len = 0;
u64 skip = 0;
u64 defrag_end = 0;
@@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
mutex_unlock(inode-i_mutex);
}
 
-   disk_super = root-fs_info-super_copy;
-   features = btrfs_super_incompat_flags(disk_super);
if (range-compress_type == BTRFS_COMPRESS_LZO) {
-   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
-   btrfs_set_super_incompat_flags(disk_super, features);
+   btrfs_set_fs_incompat(root-fs_info, COMPRESS_LZO);
}
 
ret = defrag_count;
@@ -2761,8 +2756,6 @@ static long btrfs_ioctl_default_subvol(struct file *file, 
void __user *argp)
struct btrfs_path *path;
struct btrfs_key location;
struct btrfs_disk_key disk_key;
-   struct btrfs_super_block *disk_super;
-   u64 features;
u64 objectid = 0;
u64 dir_id;
 
@@ -2813,12 +2806,7 @@ static long btrfs_ioctl_default_subvol(struct file 
*file, void __user *argp)
btrfs_mark_buffer_dirty(path-nodes[0]);
btrfs_free_path(path);
 
-   disk_super = root-fs_info-super_copy;
-   features = btrfs_super_incompat_flags(disk_super);
-   if (!(features  BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL)) {
-   features |= BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL;
-   btrfs_set_super_incompat_flags(disk_super, features);
-   }
+   btrfs_set_fs_incompat(root-fs_info, DEFAULT_SUBVOL);
btrfs_end_transaction(trans, root);
 
return 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 26da344..75ee2c7 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
compress_type = lzo;
info-compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info-mount_opt, COMPRESS);
+   btrfs_set_fs_incompat(info, COMPRESS_LZO);
} else if (strncmp(args[0].from, no, 2) == 0) {
compress_type = no;
info-compress_type = BTRFS_COMPRESS_NONE;
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to 

Re: [PATCH] Xfstests: add btrfs snapshot function test

2012-07-24 Thread David Sterba
On Sat, Jul 21, 2012 at 11:46:00AM +0800, Liu Bo wrote:
 From: Zhou Bo zhoub-f...@cn.fujitsu.com
 
 This patch adds btrfs snapshot function test to xfstests.
 
 Signed-off-by: Zhou Bo zhoub-f...@cn.fujitsu.com
 ---
  285 |  365 
 +++
  285.out |2 +
  group   |1 +
  3 files changed, 368 insertions(+), 0 deletions(-)
  create mode 100755 285
  create mode 100644 285.out
 
 diff --git a/285 b/285
 new file mode 100755
 index 000..d247af3
 --- /dev/null
 +++ b/285
 @@ -0,0 +1,365 @@
 +#! /bin/bash
 +# FS QA Test No. 285
 +#
 +# Test btrfs's subvolume and snapshot function

There already is one subvolume/snapshot test which is simple and basic.
The new one is much more extensive and this needs a more verbose
description

 +#
 +#---
 +# Copyright (c) 2012 Fujitsu.  All Rights Reserved.
 +#
 +# This program is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU General Public License as
 +# published by the Free Software Foundation.
 +#
 +# This program is distributed in the hope that it would be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program; if not, write the Free Software Foundation,
 +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 +#
 +#---
 +#
 +# creator
 +owner=zhoub-f...@cn.fujitsu.com
 +
 +n=0
 +seq=`basename $0`
 +echo QA output created by $seq
 +
 +here=`pwd`
 +tmp=/tmp/$$
 +status=0 # success is the default!
 +
 +_cleanup()
 +{
 +rm -f $tmp.*
 +}
 +
 +trap _cleanup ; exit \$status 0 1 2 3 15
 +
 +# get standard environment, filters and checks
 +. ./common.rc
 +. ./common.filter
 +
 +# real QA test starts here
 +_supported_fs btrfs
 +_supported_os Linux
 +_require_scratch
 +
 +_scratch_mkfs_sized `expr 1024 \* 1024 \* 1024`  /dev/null 21

Just curious, is there a reason why you create a 1G filesystem? This
would imply a --mixed type of fs.

 +_scratch_mount
 +
 +_prepare_snapshot()
 +{
 + _scratch_remount  /dev/null
 + btrfs sub snap $SCRATCH_MNT $SCRATCH_MNT/basesnapshot  /dev/null 
 2$here/$seq.full
 + btrfs sub snap -r $SCRATCH_MNT $SCRATCH_MNT/readonlysnapshot  
 /dev/null 2$here/$seq.full

here and below: please type full subcommands ie.

btrfs subvolume snapshot

although the short ones are allowed, using full names is future-proof
when there could be another subcommand with the same short prefix.

 + _scratch_unmount  /dev/null 2$here/$seq.full
 + VALID_SUBVOLUME=basesnapshot
 + VALID_RO_SUBVOLUME=readonlysnapshot
 + SNAPSHOTSTR=snapshot
 + FILE1=file1-
 + FILE2=file2-
 + MVFILE2=newfile2-
 + DIR1=dir1-
 + DIR2=dir2-
 + MVDIR2=newdir2-
 + MVSNAPSHOT=mvsnapshot-
 + SRCSUBVOL=srcsubvol-
 +}
 +
 +_parse_options()
 +{
 + SOURCE_TARGET=$1
 + case $SOURCE_TARGET in
 + 1)
 + SOURCE_SUBVOLUME=$VALID_SUBVOLUME
 + ;;
 + esac
 + SOURCE_READ=$2
 + case $SOURCE_READ in
 + 1)
 + SOURCE_SUBVOLUME=$VALID_RO_SUBVOLUME
 + ;;
 + esac
 + DESTINATION_TARGET=$3
 + case $DESTINATION_TARGET in
 + 1)
 + DESTINATION_SUBVOLUME=$SNAPSHOTSTR$n
 + ;;
 + esac
 + DESTINATION_READ=$4
 + case $DESTINATION_READ in
 + 1)
 + SNAPSHOTOPT_STR=-r

not that it matters much, SNAPSHOT_OPT_STR would look more consistent
with other variable names, like MOUNT_OPT_STR

 + ;;
 + 2)
 + SNAPSHOTOPT_STR=
 + ;;
 + esac
 + MOUNT_OPT=$5
 + case $MOUNT_OPT in
 + 1)
 + MOUNT_OPT_STR=
 + ;;
 + 2)
 + MOUNT_OPT_STR=-r
 + ;;
 + 3)
 + MOUNT_OPT_STR=-o nodatacow
 + ;;
 + esac
 + FILE_OPERATION_OPT=$6
 + SNAPSHOT_ACTION_OPT=$7
 + TEST_DIR1=$DIR1$n
 + TEST_DIR2=$DIR2$n
 + TEST_MVDIR2=$MVDIR2$n
 + TEST_FILE1=$FILE1$n
 + TEST_FILE2=$FILE2$n
 + TEST_MVFILE2=$MVFILE2$n
 + TEST_MVSNAPSHOT=$MVSNAPSHOT$n
 + SRC_SUBVOLUME=$SRCSUBVOL$n
 + n=$[n+1]
 +}
 +
 +_create_file()
 +{
 + mkdir $SRC_SUBVOLUME/$TEST_DIR1 $SRC_SUBVOLUME/$TEST_DIR2  /dev/null
 + touch $SRC_SUBVOLUME/$TEST_FILE1 $SRC_SUBVOLUME/$TEST_FILE2  /dev/null
 +}
 +
 +_do_file_operation()
 +{
 + btrfs filesystem balance $SCRATCH_MNT  /dev/null 21 

although 'btrfs filesystem balance /mnt' works, please use 

Re: [PATCH v4] Btrfs: Check INCOMPAT flags on remount and add helper function

2012-07-24 Thread David Sterba
On Tue, Jul 24, 2012 at 12:58:43PM -0500, Mitch Harder wrote:
 In support of the recently added capability to remount with lzo
 compression, provide a helper function to check the compression
 INCOMPAT flags when remounting with lzo compression, and set
 the flags if necessary.
 
 Also, implement the new helper function when defragmenting with
 explicit lzo compression and when setting the default subvolume.
 
 Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org

Thanks!

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send/receive: if new inode ino is less than its new directory ino, incorrect path is sent

2012-07-24 Thread Alexander Block
On Wed, Jul 18, 2012 at 7:45 PM, Alex Lyakas
alex.bolshoy.bt...@gmail.com wrote:
 Hi Alexander,
 I am testing different scenarios in order to better understand the
 non-trivial magic of
 get_cur_path()/will_overwrite_ref()/did_overwrite_ref()/did_overwrite_first_ref().
 I hit the following issue, when testing full-send:

 This is my source subvolume (inode numbers are written):
 tree -A  --inodes --noreport /mnt/src/tmp/
 /mnt/src/tmp/
 └── [270]  dir2
 └── [268]  file1_nod

 As you see, the ino(file1_nod)  ino(dir2). It is very easy to
 achieve: first create the file, then the dir, and then move the file
 to dir.

 During send the following happens (I augmented the send code with many 
 prints):

 file1_nod is sent first. Since its a new inode, it is sent as an
 orphan. When recording its reference, __record_new_ref() calls
 get_cur_path() for its parent (270). Then __get_cur_name_and_parent()
 is called on 270, which calls is_inode_existent(), which calls
 get_cur_inode_state(), and the state of the parent is will_create.
 So __get_cur_name_and_parent() creates an orphan name for it, and
 finally the new reference for 268 is recorded as:
 o270-136-0/file1_nod:

 [changed_cb:4102] key(256 INODE_ITEM 0) : NEW
 [changed_cb:4102] key(256 INODE_REF 256) : NEW
 [changed_cb:4102] key(268 INODE_ITEM 0) : NEW
 [send_create_inode:2407] NEW ino(268,135) type=010, path=[o268-135-0]
 [changed_cb:4102] key(268 INODE_REF 270) : NEW
 [get_cur_inode_state:1475] (270,136): L(EX,136)
 R(NE,18446744072099047770) sp=268 == will_create
 [is_inode_existent:1498] (270,136): NOT existent
 [__get_cur_name_and_parent:1918] ino(270,136) not existent = unique
 name [o270-136-0]
 [get_cur_path:2051] ino(0,0) cur_path=[o270-136-0]
 [__record_new_ref:2911] record new ref [o270-136-0/file1_nod]

 Then process_recorded_refs() sees that 268 is still orphan, so it
 sends rename to its valid place, but the problem is that its parent
 dir was not sent yet (and its parent dir is also an orphan):
 [process_recorded_refs:2601] ino(268,135): start with refs
 [28118.347602] [process_recorded_refs:2651] ino(268,135): new=1,
 did_overwrite_first_ref=0, is_orphan=1, valid_path=[o268-135-0]
 [28118.347605] [process_recorded_refs:2701] ino(268,135): is orphan,
 move it: [o268-135-0]=[o270-136-0/file1_nod]
 [28118.347610] [process_recorded_refs:2837] checking dir(270,136)
 [28118.347612] [process_recorded_refs:2869] ino(268,135) done with refs

 Now the parent dir is processed:
 [changed_cb:4102] key(270 INODE_ITEM 0) : NEW
 [send_create_inode:2407] NEW ino(270,136) type=04, path=[o270-136-0]
 [changed_cb:4102] key(270 INODE_REF 256) : NEW
 [get_cur_path:2051] ino(256,133) cur_path=[]
 [__record_new_ref:2911] record new ref [dir2]
 [process_recorded_refs:2601] ino(270,136): start with refs
 [process_recorded_refs:2651] ino(270,136): new=1,
 did_overwrite_first_ref=0, is_orphan=1, valid_path=[o270-136-0]
 [process_recorded_refs:2701] ino(270,136): is orphan, move it:
 [o270-136-0]=[dir2]
 [process_recorded_refs:2837] checking dir(256,133)
 [get_cur_inode_state:1475] (256,133): L(EX,133)
 R(NE,18446612135413283512) sp=270 == did_create
 [process_recorded_refs:2869] ino(270,136) done with refs

 Nothing special here, the parent is first sent as an orphan, and then
 renamed to its valid name, but it's too late.

 During receive:
 ERROR: rename o268-135-0 - o270-136-0/file1_nod failed. No such file
 or directory

 I am not yet sure where is the proper place to fix this, I just wanted
 to report it first. Basically, I think that when sending any kind of
 A_PATH, it is needed to ensure that path components exist, either as
 orphan or real path (by sending them out-of-order if needed?). But I
 am not yet sure where is the core place that should ensure this.

 Thanks,
 Alex.

I have pushed a fix for this case. Basically, the solution is to
postpone the processing of refs in not created dirs until the dir is
created. Big thanks for investigating this one.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/6] Btrfs-progs: add btrfs send/receive commands

2012-07-24 Thread Alexander Block
On Thu, Jul 19, 2012 at 3:25 PM, Alex Lyakas
alex.bolshoy.bt...@gmail.com wrote:
 +static int process_link(const char *path, const char *lnk, void *user)
 +{
 +   int ret;
 +   struct btrfs_receive *r = user;
 +   char *full_path = path_cat(r-full_subvol_path, path);
 +
 +   if (g_verbose = 1)
 +   fprintf(stderr, link %s - %s\n, path, lnk);
 +
 +   ret = link(lnk, full_path);
 +   if (ret  0) {
 +   ret = -errno;
 +   fprintf(stderr, ERROR: link %s - %s failed. %s\n, path,
 +   lnk, strerror(-ret));
 +   }

 Actually it has to be:
 char *full_link_path = path_cat(r-full_subvol_path, lnk);
 ...
 ret = link(full_path/*oldpath*/, full_link_path/*newpath*/);
 ...
 free(full_link_path);

 Thanks,
 Alex.

Actually, the pathes got mixed up in-kernel. You'll find a pushed fix
in the kernel repo. I also pushed a fix to btrfs-progs containing the
full_link_path. Thanks again :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)

2012-07-24 Thread Marios Titas
When I create a btrfs volume of size strictly less than 256 MiB then if I do
mount /dev/sdb1 /mnt/test
the kernel tries unsuccessfully to do the mount with many other file systems
before successfully trying with btrfs. For volumes of size larger than
or equal to
256 MiB it just mounts the volume without doing that. Why is this discrepancy?

Another possibly related symptom is that the volume does not appear in
/dev/disk/by-label and /dev/disk/by-uuid at all. This means that it is
impossible
to mount the volume by uuid or label.

To make sure that this isn't a udev bug, I booted my system with init=/bin/bash
in the kernel command line, and then I tried again to mount the
volume. This time
it would not mount it at all unless I explicitly specified the fs
type. On the other
hand, it could mount larger volumes without any issues.

All the experiments were done in an initially zeroed out disk. I am
using 3.4.6 kernel
with btrfs from 3.5 and the latest btrfs-progs from git.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Upgrading from 2.6.38, how?

2012-07-24 Thread Gareth Pye
Firstly I know what I've been doing has been less than 100% safe, but
I've been prepared to live with it.

For about 2 years now (you know from around the time btrfs looked like
RAID5/6 was just around the corner) I've had a server with a 5 disk
RAID10 btrfs array. I realise there has been quite some change to the
btrfs implementation since 2.6.38 but I'm hoping that there shouldn't
be anything blocking me moving to a much more modern kernel.

My proposed upgrade method is:
Boot from a live CD with the latest kernel I can find so I can do a few tests:
 A - run the fsck in read only mode to confirm things look good
 B - mount read only, confirm that I can read files well
 C - mount read write, confirm working
Install latest OS, upgrade to latest kernel, then repeat above steps.

Any likely hiccups with the above procedure and suggested alternatives?

-- 
Gareth Pye
Level 2 Judge, Melbourne, Australia
Australian MTG Forum: mtgau.com
gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com
Dear God, I would like to file a bug report
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Upgrading from 2.6.38, how?

2012-07-24 Thread Fajar A. Nugraha
On Wed, Jul 25, 2012 at 11:39 AM, Gareth Pye gar...@cerberos.id.au wrote:
 My proposed upgrade method is:
 Boot from a live CD with the latest kernel I can find so I can do a few tests:
  A - run the fsck in read only mode to confirm things look good
  B - mount read only, confirm that I can read files well
  C - mount read write, confirm working
 Install latest OS, upgrade to latest kernel, then repeat above steps.

 Any likely hiccups with the above procedure and suggested alternatives?

I'd simply install the new OS on a new partition/subvol. This is what
I did when upgrading from natty - oneiric - precise.

IIRC there are some incompatibilites (e.g. space/inode cache disk
format?) but newer kernels will just do the right thing, drop the old
cache and create a new one.

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html