Re: [PATCH] Btrfs: use arg gfp_mask to decide how to allocate tree mod

2013-05-06 Thread Jan Schmidt
On Sun, May 05, 2013 at 15:58 (+0200), Wang Shilong wrote:
 It seems the original code doesn't pass the right arg gfp_t to decide how to 
 allocate.
 Just applying this patch, fsstress will fail. So please ignore this patch, 
 will resend later..

That's in fact what the comment above the line you changed implies :-)

-Jan

 Thanks,
 Wang
 
 From: Wang Shilong wangsl-f...@cn.fujitsu.com

 We have passed arg gfp_mask to tree_mod_alloc(), so
 just use it rather than always use GFP_ATOMIC.

 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 ---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index de6de8e..0e3514f 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
 *fs_info, gfp_t flags,
   * once we switch from spin locks to something different, we should
   * honor the flags parameter here.
   */
 -tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);
 +tm = *tm_ret = kzalloc(sizeof(*tm), flags);
  if (!tm)
  return -ENOMEM;

 -- 
 1.7.11.7

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread Jan Schmidt
On Sun, May 05, 2013 at 12:07 (+0200), Kai Krakow wrote:
 I'm using an bash/rsync script[1] to backup my whole system on a nightly 
 basis to an attached USB3 drive into a scratch area, then take a snapshot of 
 this area. I'd like to have these snapshots immutable, so they should be 
 read-only.

Have you considered using btrfs send / receive for that purpose? You would just
save the dedup step.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use arg gfp_mask to decide how to allocate tree mod

2013-05-06 Thread Wang Shilong
Hello Jan,

 On Sun, May 05, 2013 at 15:58 (+0200), Wang Shilong wrote:
 It seems the original code doesn't pass the right arg gfp_t to decide how to 
 allocate.
 Just applying this patch, fsstress will fail. So please ignore this patch, 
 will resend later..
 
 That's in fact what the comment above the line you changed implies :-)


e.. It seems tree_mod_alloc() will always allocate with GFP_ATOMIC in your code.
However, i think we should try best not to allocate with GTP_ATOMIC.. otherwise,
at least.. the arg gfp_t you try to pass is useless and should be removed ^_^


Thanks,
Wang

 
 -Jan
 
 Thanks,
 Wang

 From: Wang Shilong wangsl-f...@cn.fujitsu.com

 We have passed arg gfp_mask to tree_mod_alloc(), so
 just use it rather than always use GFP_ATOMIC.

 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 ---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index de6de8e..0e3514f 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
 *fs_info, gfp_t flags,
  * once we switch from spin locks to something different, we should
  * honor the flags parameter here.
  */
 -   tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);
 +   tm = *tm_ret = kzalloc(sizeof(*tm), flags);
 if (!tm)
 return -ENOMEM;

 -- 
 1.7.11.7

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use arg gfp_mask to decide how to allocate tree mod

2013-05-06 Thread Wang Shilong
oops, i just miswrite my working email wangsl-f...@cn.fujitsu.com to 
wangsl-f...@fujitsu.com
So please do not cc at that email address. sorry, my bad.

Thanks,
Wang

 Hello Jan,
 
 On Sun, May 05, 2013 at 15:58 (+0200), Wang Shilong wrote:
 It seems the original code doesn't pass the right arg gfp_t to decide how 
 to allocate.
 Just applying this patch, fsstress will fail. So please ignore this patch, 
 will resend later..
 That's in fact what the comment above the line you changed implies :-)
 
 
 e.. It seems tree_mod_alloc() will always allocate with GFP_ATOMIC in your 
 code.
 However, i think we should try best not to allocate with GTP_ATOMIC.. 
 otherwise,
 at least.. the arg gfp_t you try to pass is useless and should be removed ^_^
 
 
 Thanks,
 Wang
 
 -Jan

 Thanks,
 Wang

 From: Wang Shilong wangsl-f...@cn.fujitsu.com

 We have passed arg gfp_mask to tree_mod_alloc(), so
 just use it rather than always use GFP_ATOMIC.

 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 ---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index de6de8e..0e3514f 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
 *fs_info, gfp_t flags,
 * once we switch from spin locks to something different, we should
 * honor the flags parameter here.
 */
 -  tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);
 +  tm = *tm_ret = kzalloc(sizeof(*tm), flags);
if (!tm)
return -ENOMEM;

 -- 
 1.7.11.7

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread Kai Krakow
Jan Schmidt list.bt...@jan-o-sch.net schrieb:

 I'm using an bash/rsync script[1] to backup my whole system on a nightly
 basis to an attached USB3 drive into a scratch area, then take a snapshot
 of this area. I'd like to have these snapshots immutable, so they should
 be read-only.
 
 Have you considered using btrfs send / receive for that purpose? You would
 just save the dedup step.

This is planned for later. In the first step I want to stay as file system 
agnostic for the source as possible. But I've put it on my todo list in the 
gist.

Regards,
Kai

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Kai Krakow
Josef Bacik jba...@fusionio.com schrieb:

 I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches.
 I'm running bedup[1] on a regular basis and it is now the third time that
 I got back to my PC just to find it hard-frozen and I needed to use the
 reset button.
 
 It looks like this happens only while running bedup on my two btrfs
 filesystems but I'm not sure if it happens for any of the filesystems or
 only one. This is my setup:

[snip]

 Can you please file a bug for this issue on bugzilla.kernel.org so I can
 make
 sure we don't lose track of it?  Make sure the component is set to Btrfs.

Meanwhile I found out: It does not only happen during dedup with bedup but 
also when creating my rsync backup. I will file all the details to bugzilla 
this evening.

Thanks,
Kai

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/5] BTRFS hot relocation support

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

  The patchset as RFC is sent out mainly to see if it goes in the
correct development direction.

  The patchset is trying to introduce hot relocation support
for BTRFS. In hybrid storage environment, when the data in
HDD disk get hot, it can be relocated to SSD disk by BTRFS
hot relocation support automatically; also, if SSD disk ratio
exceed its upper threshold, the data which get cold can be
looked up and relocated to HDD disk to make more space in SSD
disk at first, and then the data which get hot will be relocated
to SSD disk automatically.

  BTRFS hot relocation mainly reserve block space from SSD disk
at first, load the hot data to page cache from HDD, allocate
block space from SSD disk, and finally write the data to SSD disk.

  If you'd like to play with it, pls pull the patchset from
my git on github:
  https://github.com/wuzhy/kernel.git hot_reloc

For how to use, please refer too the example below:

root@debian-i386:~# echo 0  /sys/block/vdc/queue/rotational
^^^ Above command will hack /dev/vdc to be one SSD disk
root@debian-i386:~# echo 99  /proc/sys/fs/hot-age-interval
root@debian-i386:~# echo 10  /proc/sys/fs/hot-update-interval
root@debian-i386:~# echo 10  /proc/sys/fs/hot-reloc-interval
root@debian-i386:~# mkfs.btrfs -d single -m single -h /dev/vdb /dev/vdc -f
 
WARNING! - Btrfs v0.20-rc1-254-gb0136aa-dirty IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
 
[ 140.279011] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 1 transid 
16 /dev/vdb
[ 140.283650] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 transid 
16 /dev/vdc
[ 140.517089] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 
3 /dev/vdb
[ 140.550759] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 
3 /dev/vdb
[ 140.552473] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 transid 
16 /dev/vdc
adding device /dev/vdc id 2
[ 140.636215] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 2 transid 
3 /dev/vdc
fs created label (null) on /dev/vdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 14.65GB
Btrfs v0.20-rc1-254-gb0136aa-dirty
root@debian-i386:~# mount -o hot_move /dev/vdb /data2
[ 144.855471] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 
6 /dev/vdb
[ 144.870444] btrfs: disk space caching is enabled
[ 144.904214] VFS: Turning on hot data tracking
root@debian-i386:~# dd if=/dev/zero of=/data2/test1 bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 23.4948 s, 91.4 MB/s
root@debian-i386:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 16G 13G 2.2G 86% /
tmpfs 4.8G 0 4.8G 0% /lib/init/rw
udev 10M 176K 9.9M 2% /dev
tmpfs 4.8G 0 4.8G 0% /dev/shm
/dev/vdb 15G 2.0G 13G 14% /data2
root@debian-i386:~# btrfs fi df /data2
Data: total=3.01GB, used=2.00GB
System: total=4.00MB, used=4.00KB
Metadata: total=8.00MB, used=2.19MB
Data_SSD: total=8.00MB, used=0.00
root@debian-i386:~# echo 108  /proc/sys/fs/hot-reloc-threshold
^^^ Above command will start HOT RLEOCATE, because The data temperature is 
currently 109
root@debian-i386:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 16G 13G 2.2G 86% /
tmpfs 4.8G 0 4.8G 0% /lib/init/rw
udev 10M 176K 9.9M 2% /dev
tmpfs 4.8G 0 4.8G 0% /dev/shm
/dev/vdb 15G 2.1G 13G 14% /data2
root@debian-i386:~# btrfs fi df /data2
Data: total=3.01GB, used=6.25MB
System: total=4.00MB, used=4.00KB
Metadata: total=8.00MB, used=2.26MB
Data_SSD: total=2.01GB, used=2.00GB
root@debian-i386:~# 

Zhi Yong Wu (5):
  vfs: add one list_head field
  btrfs: add one new block group
  btrfs: add one hot relocation kthread
  procfs: add three proc interfaces
  btrfs: add hot relocation support

 fs/btrfs/Makefile|   3 +-
 fs/btrfs/ctree.h |  26 +-
 fs/btrfs/extent-tree.c   | 107 +-
 fs/btrfs/extent_io.c |  31 +-
 fs/btrfs/extent_io.h |   4 +
 fs/btrfs/file.c  |  36 +-
 fs/btrfs/hot_relocate.c  | 802 +++
 fs/btrfs/hot_relocate.h  |  48 +++
 fs/btrfs/inode-map.c |  13 +-
 fs/btrfs/inode.c |  92 -
 fs/btrfs/ioctl.c |  23 +-
 fs/btrfs/relocation.c|  14 +-
 fs/btrfs/super.c |  30 +-
 fs/btrfs/volumes.c   |  28 +-
 fs/hot_tracking.c|   1 +
 include/linux/btrfs.h|   4 +
 include/linux/hot_tracking.h |   1 +
 kernel/sysctl.c  |  22 ++
 18 files changed, 1234 insertions(+), 51 deletions(-)
 create mode 100644 fs/btrfs/hot_relocate.c
 create mode 100644 fs/btrfs/hot_relocate.h

-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/5] vfs: add one list_head field

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

  Add one list_head field 'reloc_list' to accommodate
hot relocation support.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/hot_tracking.c| 1 +
 include/linux/hot_tracking.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 3b0002c..7071ac8 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -41,6 +41,7 @@ static void hot_comm_item_init(struct hot_comm_item *ci, int 
type)
clear_bit(HOT_IN_LIST, ci-delete_flag);
clear_bit(HOT_DELETING, ci-delete_flag);
INIT_LIST_HEAD(ci-track_list);
+   INIT_LIST_HEAD(ci-reloc_list);
memset(ci-hot_freq_data, 0, sizeof(struct hot_freq_data));
ci-hot_freq_data.avg_delta_reads = (u64) -1;
ci-hot_freq_data.avg_delta_writes = (u64) -1;
diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
index 2272975..49f901c 100644
--- a/include/linux/hot_tracking.h
+++ b/include/linux/hot_tracking.h
@@ -68,6 +68,7 @@ struct hot_comm_item {
struct rb_node rb_node; /* rbtree index */
unsigned long delete_flag;
struct list_head track_list;/* link to *_map[] */
+   struct list_head reloc_list;/* used in hot relocation*/
 };
 
 /* An item representing an inode and its access frequency */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/5] btrfs: add one new block group

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

  Introduce one new block group BTRFS_BLOCK_GROUP_DATA_SSD,
which is used to differentiate if the block space is reserved
and allocated from one HDD disk or SSD disk.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/btrfs/Makefile   |   3 +-
 fs/btrfs/ctree.h|  24 ++-
 fs/btrfs/extent-tree.c  | 107 +++-
 fs/btrfs/extent_io.c|  31 --
 fs/btrfs/extent_io.h|   4 ++
 fs/btrfs/file.c |  36 +---
 fs/btrfs/hot_relocate.c |  78 +++
 fs/btrfs/hot_relocate.h |  31 ++
 fs/btrfs/inode-map.c|  13 +-
 fs/btrfs/inode.c|  92 +
 fs/btrfs/ioctl.c|  23 +--
 fs/btrfs/relocation.c   |  14 ++-
 fs/btrfs/super.c|   3 +-
 fs/btrfs/volumes.c  |  28 -
 14 files changed, 439 insertions(+), 48 deletions(-)
 create mode 100644 fs/btrfs/hot_relocate.c
 create mode 100644 fs/btrfs/hot_relocate.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 3932224..94f1ea5 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o
+  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
+  hot_relocate.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 701dec5..f4c4419 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -961,6 +961,16 @@ struct btrfs_dev_replace_item {
 #define BTRFS_BLOCK_GROUP_RAID10   (1ULL  6)
 #define BTRFS_BLOCK_GROUP_RAID5(1  7)
 #define BTRFS_BLOCK_GROUP_RAID6(1  8)
+/*
+ * New block groups for use with hot data relocation feature. When hot data
+ * relocation is on, *_SSD block groups are forced to nonrotating drives and
+ * the plain DATA and METADATA block groups are forced to rotating drives.
+ *
+ * This should be further optimized, i.e. force metadata to SSD or relocate
+ * inode metadata to SSD when any of its subfile ranges are relocated to SSD
+ * so that reads and writes aren't delayed by HDD seeks.
+ */
+#define BTRFS_BLOCK_GROUP_DATA_SSD (1ULL  9)
 #define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE
 
 enum btrfs_raid_types {
@@ -976,7 +986,8 @@ enum btrfs_raid_types {
 
 #define BTRFS_BLOCK_GROUP_TYPE_MASK(BTRFS_BLOCK_GROUP_DATA |\
 BTRFS_BLOCK_GROUP_SYSTEM |  \
-BTRFS_BLOCK_GROUP_METADATA)
+BTRFS_BLOCK_GROUP_METADATA | \
+BTRFS_BLOCK_GROUP_DATA_SSD)
 
 #define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 |   \
 BTRFS_BLOCK_GROUP_RAID1 |   \
@@ -1508,6 +1519,7 @@ struct btrfs_fs_info {
struct list_head space_info;
 
struct btrfs_space_info *data_sinfo;
+   struct btrfs_space_info *hot_data_sinfo;
 
struct reloc_control *reloc_ctl;
 
@@ -1532,6 +1544,7 @@ struct btrfs_fs_info {
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
+   u64 avail_data_ssd_alloc_bits;
 
/* restriper state */
spinlock_t balance_lock;
@@ -1544,6 +1557,7 @@ struct btrfs_fs_info {
 
unsigned data_chunk_allocations;
unsigned metadata_ratio;
+   unsigned data_ssd_chunk_allocations;
 
void *bdev_holder;
 
@@ -1901,6 +1915,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1  22)
 #define BTRFS_MOUNT_HOT_TRACK  (1  23)
+#define BTRFS_MOUNT_HOT_MOVE   (1  24)
 
 #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
@@ -1922,6 +1937,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_INODE_NOATIME(1  9)
 #define BTRFS_INODE_DIRSYNC(1  10)
 #define BTRFS_INODE_COMPRESS   (1  11)
+#define BTRFS_INODE_HOT(1  12)
 
 #define BTRFS_INODE_ROOT_ITEM_INIT (1  31)
 
@@ -3014,6 +3030,8 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root 
*root,
 int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  u64 objectid, u64 offset, u64 bytenr);
+struct btrfs_block_group_cache 

[RFC 4/5] procfs: add three proc interfaces

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

  Add three proc interfaces hot-reloc-interval, hot-reloc-threshold,
and hot-reloc-max-items under the dir /proc/sys/fs/ in order to
turn HOT_RELOC_INTERVAL, HOT_RELOC_THRESHOLD, and HOT_RELOC_MAX_ITEMS
into be tunable.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/btrfs/hot_relocate.c | 26 +-
 fs/btrfs/hot_relocate.h |  4 
 include/linux/btrfs.h   |  4 
 kernel/sysctl.c | 22 ++
 4 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/hot_relocate.c b/fs/btrfs/hot_relocate.c
index 683e154..aa8c9f0 100644
--- a/fs/btrfs/hot_relocate.c
+++ b/fs/btrfs/hot_relocate.c
@@ -25,7 +25,7 @@
  * The relocation code below operates on the heat map lists to identify
  * hot or cold data logical file ranges that are candidates for relocation.
  * The triggering mechanism for relocation is controlled by a global heat
- * threshold integer value (HOT_RELOC_THRESHOLD). Ranges are
+ * threshold integer value (sysctl_hot_reloc_threshold). Ranges are
  * queued for relocation by the periodically executing relocate kthread,
  * which updates the global heat threshold and responds to space pressure
  * on the SSDs.
@@ -52,6 +52,15 @@
  * (assuming, critically, the HOT_MOVE option is set at mount time).
  */
 
+int sysctl_hot_reloc_threshold = 150;
+EXPORT_SYMBOL_GPL(sysctl_hot_reloc_threshold);
+
+int sysctl_hot_reloc_interval __read_mostly = 120;
+EXPORT_SYMBOL_GPL(sysctl_hot_reloc_interval);
+
+int sysctl_hot_reloc_max_items __read_mostly = 250;
+EXPORT_SYMBOL_GPL(sysctl_hot_reloc_max_items);
+
 static void hot_set_extent_bits(struct extent_io_tree *tree, u64 start,
u64 end, struct extent_state **cached_state,
gfp_t mask, int storage_type, int flag)
@@ -165,7 +174,7 @@ static int hot_calc_ssd_ratio(struct hot_reloc *hot_reloc)
 static int hot_update_threshold(struct hot_reloc *hot_reloc,
int update)
 {
-   int thresh = hot_reloc-thresh;
+   int thresh = sysctl_hot_reloc_threshold;
int ratio = hot_calc_ssd_ratio(hot_reloc);
 
/* Sometimes update global threshold, others not */
@@ -189,7 +198,7 @@ static int hot_update_threshold(struct hot_reloc *hot_reloc,
thresh = 0;
}
 
-   hot_reloc-thresh = thresh;
+   sysctl_hot_reloc_threshold = thresh;
return ratio;
 }
 
@@ -280,7 +289,7 @@ static int hot_queue_extent(struct hot_reloc *hot_reloc,
hot_comm_item_put(ci);
spin_unlock(he-i_lock);
 
-   if (*counter = HOT_RELOC_MAX_ITEMS)
+   if (*counter = sysctl_hot_reloc_max_items)
break;
 
if (kthread_should_stop()) {
@@ -361,7 +370,7 @@ again:
while (1) {
lock_extent(tree, page_start, page_end);
ordered = btrfs_lookup_ordered_extent(inode,
-   page_start);
+ page_start);
unlock_extent(tree, page_start, page_end);
if (!ordered)
break;
@@ -642,7 +651,7 @@ void hot_do_relocate(struct hot_reloc *hot_reloc)
 
run++;
ratio = hot_update_threshold(hot_reloc, !(run % 15));
-   thresh = hot_reloc-thresh;
+   thresh = sysctl_hot_reloc_threshold;
 
INIT_LIST_HEAD(hot_reloc-hot_relocq[TYPE_NONROT]);
 
@@ -652,7 +661,7 @@ void hot_do_relocate(struct hot_reloc *hot_reloc)
if (count_to_hot == 0)
return;
 
-   count_to_cold = HOT_RELOC_MAX_ITEMS;
+   count_to_cold = sysctl_hot_reloc_max_items;
 
/* Don't move cold data to HDD unless there's space pressure */
if (ratio  HIGH_WATER_LEVEL)
@@ -734,7 +743,7 @@ static int hot_relocate_kthread(void *arg)
unsigned long delay;
 
do {
-   delay = HZ * HOT_RELOC_INTERVAL;
+   delay = HZ * sysctl_hot_reloc_interval;
if (mutex_trylock(hot_reloc-hot_reloc_mutex)) {
hot_do_relocate(hot_reloc);
mutex_unlock(hot_reloc-hot_reloc_mutex);
@@ -766,7 +775,6 @@ int hot_relocate_init(struct btrfs_fs_info *fs_info)
 
fs_info-hot_reloc = hot_reloc;
hot_reloc-fs_info = fs_info;
-   hot_reloc-thresh = HOT_RELOC_THRESHOLD;
for (i = 0; i  MAX_RELOC_TYPES; i++)
INIT_LIST_HEAD(hot_reloc-hot_relocq[i]);
mutex_init(hot_reloc-hot_reloc_mutex);
diff --git a/fs/btrfs/hot_relocate.h b/fs/btrfs/hot_relocate.h
index 077d9b3..ca30944 100644
--- a/fs/btrfs/hot_relocate.h
+++ b/fs/btrfs/hot_relocate.h
@@ -24,9 +24,6 @@ enum {
MAX_RELOC_TYPES
 };
 
-#define HOT_RELOC_INTERVAL  120
-#define HOT_RELOC_THRESHOLD 150
-#define HOT_RELOC_MAX_ITEMS 250
 
 #define HEAT_MAX_VALUE(MAP_SIZE - 1)
 

[RFC 5/5] btrfs: add hot relocation support

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

  Add one new mount option '-o hot_move' for hot
relocation support. When hot relocation is enabled,
hot tracking will be enabled automatically.
  Its usage looks like:
mount -o hot_move
mount -o nouser,hot_move
mount -o nouser,hot_move,loop
mount -o hot_move,nouser

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/btrfs/super.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4cbd0de..b342f6f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -311,8 +311,13 @@ static void btrfs_put_super(struct super_block *sb)
 * process...  Whom would you report that to?
 */
 
+   /* Hot data relocation */
+   if (btrfs_test_opt(btrfs_sb(sb)-tree_root, HOT_MOVE))
+   hot_relocate_exit(btrfs_sb(sb));
+
/* Hot data tracking */
-   if (btrfs_test_opt(btrfs_sb(sb)-tree_root, HOT_TRACK))
+   if (btrfs_test_opt(btrfs_sb(sb)-tree_root, HOT_MOVE)
+   || btrfs_test_opt(btrfs_sb(sb)-tree_root, HOT_TRACK))
hot_track_exit(sb);
 }
 
@@ -327,7 +332,7 @@ enum {
Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_hot_track,
-   Opt_err,
+   Opt_hot_move, Opt_err,
 };
 
 static match_table_t tokens = {
@@ -368,6 +373,7 @@ static match_table_t tokens = {
{Opt_check_integrity_print_mask, check_int_print_mask=%d},
{Opt_fatal_errors, fatal_errors=%s},
{Opt_hot_track, hot_track},
+   {Opt_hot_move, hot_move},
{Opt_err, NULL},
 };
 
@@ -636,6 +642,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
case Opt_hot_track:
btrfs_set_opt(info-mount_opt, HOT_TRACK);
break;
+   case Opt_hot_move:
+   btrfs_set_opt(info-mount_opt, HOT_MOVE);
+   break;
case Opt_err:
printk(KERN_INFO btrfs: unrecognized mount option 
   '%s'\n, p);
@@ -863,17 +872,26 @@ static int btrfs_fill_super(struct super_block *sb,
goto fail_close;
}
 
-   if (btrfs_test_opt(fs_info-tree_root, HOT_TRACK)) {
+   if (btrfs_test_opt(fs_info-tree_root, HOT_MOVE)
+   || btrfs_test_opt(fs_info-tree_root, HOT_TRACK)) {
err = hot_track_init(sb);
if (err)
goto fail_hot;
}
 
+   if (btrfs_test_opt(fs_info-tree_root, HOT_MOVE)) {
+   err = hot_relocate_init(fs_info);
+   if (err)
+   goto fail_reloc;
+   }
+
save_mount_options(sb, data);
cleancache_init_fs(sb);
sb-s_flags |= MS_ACTIVE;
return 0;
 
+fail_reloc:
+   hot_track_exit(sb);
 fail_hot:
dput(sb-s_root);
sb-s_root = NULL;
@@ -974,6 +992,8 @@ static int btrfs_show_options(struct seq_file *seq, struct 
dentry *dentry)
seq_puts(seq, ,fatal_errors=panic);
if (btrfs_test_opt(root, HOT_TRACK))
seq_puts(seq, ,hot_track);
+   if (btrfs_test_opt(root, HOT_MOVE))
+   seq_puts(seq, ,hot_move);
return 0;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Jan Schmidt
On Sun, May 05, 2013 at 18:10 (+0200), Kai Krakow wrote:
 Hello list,
 
 Kai Krakow hurikhan77+bt...@gmail.com schrieb:
 
 I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches.
 I'm running bedup[1] on a regular basis and it is now the third time that
 I got back to my PC just to find it hard-frozen and I needed to use the
 reset button.

 It looks like this happens only while running bedup on my two btrfs
 filesystems but I'm not sure if it happens for any of the filesystems or
 only one. This is my setup:

 # cat /etc/fstab (shortened)
 UUID=d2bb232a-2e8f-4951-8bcc-97e237f1b536 / btrfs
 compress=lzo,subvol=root64 0 1 # /dev/sd{a,b,c}3
 LABEL=usb-backup /mnt/private/usb-backup btrfs noauto,compress-
 force=zlib,subvolid=0,autodefrag,comment=systemd.automount 0 0 # external
 usb3 disk

 # btrfs filesystem show
 Label: 'usb-backup'  uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca
 Total devices 1 FS bytes used 1.13TB
 devid1 size 1.82TB used 1.75TB path /dev/sdd1

 Label: 'system'  uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536
 Total devices 3 FS bytes used 914.43GB
 devid3 size 927.26GB used 426.03GB path /dev/sdc3
 devid2 size 927.26GB used 426.03GB path /dev/sdb3
 devid1 size 927.26GB used 427.07GB path /dev/sda3

 Btrfs v0.20-rc1

 Since the system hard-freezes I have no messages from dmesg. But I suspect
 it to be related to the defragmentation option in bedup (I've switched to
 bedub with --defrag since 3.9.0, and autodefrag for the backup drive).
 Just in case, I'm going to try without this option now and see if it won't
 freeze.

 I was able to take a physical screenshot with a real camera of a kernel
 backtrace one time when the freeze happened. I wonder if it is useful to
 you and where to send it. I just don't want to upload jpegs right here to
 the list without asking first.

 The big plus is: Altough I had to hard-reset the frozen system several
 times now, btrfs survived the procedure without any impact (just boot
 times increases noticeably, probably due to log-replays or something). So
 thumbs up for the developers on that point.
 
 Thanks to the great cwillu netcat service here's my backtrace:

That one should be fixed in btrfs-next. If you can reliably reproduce the bug
I'd be glad to get a confirmation - you can probably even save putting it on
bugzilla then ;-)

-Jan

 4,1072,17508258745,-;[ cut here ]
 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144!
 4,1074,17508258791,-;invalid opcode:  [#1] SMP 
 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O) 
 vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib 
 snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev 
 coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core 
 lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix
 4,1076,17508258966,-;CPU 0 
 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G C O 
 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3
 4,1078,17508259023,-;RIP: 0010:[81161d12]  [81161d12] 
 __tree_mod_log_rewind+0x4c/0x121
 4,1079,17508259064,-;RSP: 0018:8801966718e8  EFLAGS: 00010293
 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX: 
 880196671888
 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI: 
 8804087be700
 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09: 
 880196671898
 4,1083,17508259165,-;R10:  R11:  R12: 
 880406c2e000
 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15: 
 0001
 4,1085,17508259218,-;FS:  () GS:88041f20() 
 knlGS:
 4,1086,17508259248,-;CS:  0010 DS:  ES:  CR0: 80050033
 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4: 
 000407f0
 4,1088,17508259297,-;DR0:  DR1:  DR2: 
 
 4,1089,17508259323,-;DR3:  DR6: 0ff0 DR7: 
 0400
 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo 
 88019667, task 8801b82e5400)
 4,1091,17508259383,-;Stack:
 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000 
 8a11
 4,1093,17508259423,-; 8802d0a14000 81167606 0246 
 8801ee8d33b0
 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360 
 
 4,1095,17508259488,-;Call Trace:
 4,1096,17508259500,-; [81167606] ? 
 btrfs_search_old_slot+0x543/0x61e
 4,1097,17508259526,-; [811692de] ? btrfs_next_old_leaf+0x8a/0x332
 4,1098,17508259552,-; [811c484a] ? 
 __resolve_indirect_refs+0x2d8/0x408
 4,1099,17508259578,-; [811c533b] ? find_parent_nodes+0x9c1/0xcec
 4,1100,17508259602,-; [811c5e06] 

Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Harald Glatt
I have this problem too, and I cannot reproduce it properly... When is
that patch in btrfs-next going to be in the mainline kernel?

On Mon, May 6, 2013 at 10:55 AM, Jan Schmidt list.bt...@jan-o-sch.net wrote:
 On Sun, May 05, 2013 at 18:10 (+0200), Kai Krakow wrote:
 Hello list,

 Kai Krakow hurikhan77+bt...@gmail.com schrieb:

 I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches.
 I'm running bedup[1] on a regular basis and it is now the third time that
 I got back to my PC just to find it hard-frozen and I needed to use the
 reset button.

 It looks like this happens only while running bedup on my two btrfs
 filesystems but I'm not sure if it happens for any of the filesystems or
 only one. This is my setup:

 # cat /etc/fstab (shortened)
 UUID=d2bb232a-2e8f-4951-8bcc-97e237f1b536 / btrfs
 compress=lzo,subvol=root64 0 1 # /dev/sd{a,b,c}3
 LABEL=usb-backup /mnt/private/usb-backup btrfs noauto,compress-
 force=zlib,subvolid=0,autodefrag,comment=systemd.automount 0 0 # external
 usb3 disk

 # btrfs filesystem show
 Label: 'usb-backup'  uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca
 Total devices 1 FS bytes used 1.13TB
 devid1 size 1.82TB used 1.75TB path /dev/sdd1

 Label: 'system'  uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536
 Total devices 3 FS bytes used 914.43GB
 devid3 size 927.26GB used 426.03GB path /dev/sdc3
 devid2 size 927.26GB used 426.03GB path /dev/sdb3
 devid1 size 927.26GB used 427.07GB path /dev/sda3

 Btrfs v0.20-rc1

 Since the system hard-freezes I have no messages from dmesg. But I suspect
 it to be related to the defragmentation option in bedup (I've switched to
 bedub with --defrag since 3.9.0, and autodefrag for the backup drive).
 Just in case, I'm going to try without this option now and see if it won't
 freeze.

 I was able to take a physical screenshot with a real camera of a kernel
 backtrace one time when the freeze happened. I wonder if it is useful to
 you and where to send it. I just don't want to upload jpegs right here to
 the list without asking first.

 The big plus is: Altough I had to hard-reset the frozen system several
 times now, btrfs survived the procedure without any impact (just boot
 times increases noticeably, probably due to log-replays or something). So
 thumbs up for the developers on that point.

 Thanks to the great cwillu netcat service here's my backtrace:

 That one should be fixed in btrfs-next. If you can reliably reproduce the bug
 I'd be glad to get a confirmation - you can probably even save putting it on
 bugzilla then ;-)

 -Jan

 4,1072,17508258745,-;[ cut here ]
 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144!
 4,1074,17508258791,-;invalid opcode:  [#1] SMP
 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O)
 vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib
 snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev
 coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core
 lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix
 4,1076,17508258966,-;CPU 0
 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G C O
 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3
 4,1078,17508259023,-;RIP: 0010:[81161d12]  [81161d12]
 __tree_mod_log_rewind+0x4c/0x121
 4,1079,17508259064,-;RSP: 0018:8801966718e8  EFLAGS: 00010293
 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX:
 880196671888
 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI:
 8804087be700
 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09:
 880196671898
 4,1083,17508259165,-;R10:  R11:  R12:
 880406c2e000
 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15:
 0001
 4,1085,17508259218,-;FS:  () GS:88041f20()
 knlGS:
 4,1086,17508259248,-;CS:  0010 DS:  ES:  CR0: 80050033
 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4:
 000407f0
 4,1088,17508259297,-;DR0:  DR1:  DR2:
 
 4,1089,17508259323,-;DR3:  DR6: 0ff0 DR7:
 0400
 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo
 88019667, task 8801b82e5400)
 4,1091,17508259383,-;Stack:
 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000
 8a11
 4,1093,17508259423,-; 8802d0a14000 81167606 0246
 8801ee8d33b0
 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360
 
 4,1095,17508259488,-;Call Trace:
 4,1096,17508259500,-; [81167606] ?
 btrfs_search_old_slot+0x543/0x61e
 4,1097,17508259526,-; [811692de] ? btrfs_next_old_leaf+0x8a/0x332
 

[PATCH V3] Btrfs: remove btrfs_sector_sum structure

2013-05-06 Thread Miao Xie
Using the structure btrfs_sector_sum to keep the checksum value is
unnecessary, because the extents that btrfs_sector_sum points to are
continuous, we can find out the expected checksums by btrfs_ordered_sum's
bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
removing bytenr, there is only one member in the structure, so it makes
no sense to keep the structure, just remove it, and use a u32 array to
store the checksum value.

By this change, we don't use the while loop to get the checksums one by
one. Now, we can get several checksum value at one time, it improved the
performance by ~74% on my SSD (31MB/s - 54MB/s).

test command:
 # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v2 - v3:
- address the problem that the csums was inserted into the wrong range, this bug
  was reported by Josef.

Changelog v1 - v2:
- modify the changelog and the title which can not explain this patch clearly
- fix the 64bit division problem on 32bit machine
---
 fs/btrfs/file-item.c| 144 ++--
 fs/btrfs/ordered-data.c |  19 +++
 fs/btrfs/ordered-data.h |  25 ++---
 fs/btrfs/relocation.c   |  10 
 fs/btrfs/scrub.c|  16 ++
 5 files changed, 73 insertions(+), 141 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index b193bf3..a7bfc95 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -34,8 +34,7 @@
 
 #define MAX_ORDERED_SUM_BYTES(r) ((PAGE_SIZE - \
   sizeof(struct btrfs_ordered_sum)) / \
-  sizeof(struct btrfs_sector_sum) * \
-  (r)-sectorsize - (r)-sectorsize)
+  sizeof(u32) * (r)-sectorsize)
 
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
@@ -297,7 +296,6 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
struct btrfs_path *path;
struct extent_buffer *leaf;
struct btrfs_ordered_sum *sums;
-   struct btrfs_sector_sum *sector_sum;
struct btrfs_csum_item *item;
LIST_HEAD(tmplist);
unsigned long offset;
@@ -368,34 +366,28 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
  struct btrfs_csum_item);
while (start  csum_end) {
size = min_t(size_t, csum_end - start,
-   MAX_ORDERED_SUM_BYTES(root));
+MAX_ORDERED_SUM_BYTES(root));
sums = kzalloc(btrfs_ordered_sum_size(root, size),
-   GFP_NOFS);
+  GFP_NOFS);
if (!sums) {
ret = -ENOMEM;
goto fail;
}
 
-   sector_sum = sums-sums;
sums-bytenr = start;
-   sums-len = size;
+   sums-len = (int)size;
 
offset = (start - key.offset) 
root-fs_info-sb-s_blocksize_bits;
offset *= csum_size;
+   size = root-fs_info-sb-s_blocksize_bits;
 
-   while (size  0) {
-   read_extent_buffer(path-nodes[0],
-   sector_sum-sum,
-   ((unsigned long)item) +
-   offset, csum_size);
-   sector_sum-bytenr = start;
-
-   size -= root-sectorsize;
-   start += root-sectorsize;
-   offset += csum_size;
-   sector_sum++;
-   }
+   read_extent_buffer(path-nodes[0],
+  sums-sums,
+  ((unsigned long)item) + offset,
+  csum_size * size);
+
+   start += root-sectorsize * size;
list_add_tail(sums-list, tmplist);
}
path-slots[0]++;
@@ -417,23 +409,20 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
   struct bio *bio, u64 file_start, int contig)
 {
struct btrfs_ordered_sum *sums;
-   struct btrfs_sector_sum *sector_sum;
struct btrfs_ordered_extent *ordered;
char *data;
struct bio_vec *bvec = bio-bi_io_vec;
int bio_index = 0;
+   int index;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
-   u64 

[PATCH] Btrfs: introduce qgroup_ulist to avoid frequently allocating/freeing ulist

2013-05-06 Thread Wang Shilong
When doing qgroup accounting, we call ulist_alloc()/ulist_free() every time
when we want to walk qgroup tree.

By introducing 'qgroup_ulist', we only need to call ulist_alloc()/ulist_free()
once. This reduce some sys time to allocate memory, see the measurements below

fsstress -p 4 -n 1 -d $dir

With this patch:

real0m50.153s
user0m0.081s
sys 0m6.294s

real0m51.113s
user0m0.092s
sys 0m6.220s

real0m52.610s
user0m0.096s
sys 0m6.125savg 6.213
-
Without the patch:

real0m54.825s
user0m0.061s
sys 0m10.665s

real1m6.401s
user0m0.089s
sys 0m11.218s

real1m13.768s
user0m0.087s
sys 0m10.665s   avg 10.849

we can see the sys time reduce ~43%.

Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |6 
 fs/btrfs/disk-io.c |1 +
 fs/btrfs/qgroup.c  |   70 ++-
 3 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 63c328a..3ccb829 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1594,6 +1594,12 @@ struct btrfs_fs_info {
struct rb_root qgroup_tree;
spinlock_t qgroup_lock;
 
+   /*
+* used to avoid frequently calling ulist_alloc()/ulist_free()
+* when doing qgroup accounting, it must be protected by qgroup_lock.
+*/
+   struct ulist *qgroup_ulist;
+
/* protect user change for quota operations */
struct mutex qgroup_ioctl_lock;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2223494..ee8ce33 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2267,6 +2267,7 @@ int open_ctree(struct super_block *sb,
fs_info-qgroup_seq = 1;
fs_info-quota_enabled = 0;
fs_info-pending_quota_state = 0;
+   fs_info-qgroup_ulist = NULL;
mutex_init(fs_info-qgroup_rescan_lock);
 
btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9d49c58..7f38cce 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -259,6 +259,12 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
if (!fs_info-quota_enabled)
return 0;
 
+   fs_info-qgroup_ulist = ulist_alloc(GFP_NOFS);
+   if (!fs_info-qgroup_ulist) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
path = btrfs_alloc_path();
if (!path) {
ret = -ENOMEM;
@@ -424,6 +430,9 @@ out:
}
btrfs_free_path(path);
 
+   if (ret)
+   ulist_free(fs_info-qgroup_ulist);
+
return ret  0 ? ret : 0;
 }
 
@@ -460,6 +469,7 @@ void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info)
}
kfree(qgroup);
}
+   ulist_free(fs_info-qgroup_ulist);
 }
 
 static int add_qgroup_relation_item(struct btrfs_trans_handle *trans,
@@ -819,6 +829,12 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
goto out;
}
 
+   fs_info-qgroup_ulist = ulist_alloc(GFP_NOFS);
+   if (!fs_info-qgroup_ulist) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
/*
 * initially create the quota tree
 */
@@ -916,6 +932,8 @@ out_free_root:
kfree(quota_root);
}
 out:
+   if (ret)
+   ulist_free(fs_info-qgroup_ulist);
mutex_unlock(fs_info-qgroup_ioctl_lock);
return ret;
 }
@@ -1355,7 +1373,6 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
u64 ref_root;
struct btrfs_qgroup *qgroup;
struct ulist *roots = NULL;
-   struct ulist *tmp = NULL;
u64 seq;
int ret = 0;
int sgn;
@@ -1448,31 +1465,28 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
/*
 * step 1: for each old ref, visit all nodes once and inc refcnt
 */
-   tmp = ulist_alloc(GFP_ATOMIC);
-   if (!tmp) {
-   ret = -ENOMEM;
-   goto unlock;
-   }
+   ulist_reinit(fs_info-qgroup_ulist);
seq = fs_info-qgroup_seq;
fs_info-qgroup_seq += roots-nnodes + 1; /* max refcnt */
 
-   ret = qgroup_account_ref_step1(fs_info, roots, tmp, seq);
+   ret = qgroup_account_ref_step1(fs_info, roots, fs_info-qgroup_ulist,
+  seq);
if (ret)
goto unlock;
 
/*
 * step 2: walk from the new root
 */
-   ret = qgroup_account_ref_step2(fs_info, roots, tmp, seq, sgn,
-  node-num_bytes, qgroup);
+   ret = qgroup_account_ref_step2(fs_info, roots, fs_info-qgroup_ulist,
+  seq, sgn, node-num_bytes, qgroup);
if (ret)
goto unlock;
 
/*
 * step 3: walk again from old refs
 */
-   ret = 

Re: [PATCH 4/4] btrfs: offline dedupe

2013-05-06 Thread David Sterba
On Tue, Apr 16, 2013 at 03:15:35PM -0700, Mark Fasheh wrote:
 +static void btrfs_double_lock(struct inode *inode1, u64 loff1,
 +   struct inode *inode2, u64 loff2, u64 len)
 +{
 + if (inode1  inode2) {
 + mutex_lock_nested(inode1-i_mutex, I_MUTEX_PARENT);
 + mutex_lock_nested(inode2-i_mutex, I_MUTEX_CHILD);
 + lock_extent_range(inode1, loff1, len);
 + lock_extent_range(inode2, loff2, len);
 + } else {
 + mutex_lock_nested(inode2-i_mutex, I_MUTEX_PARENT);
 + mutex_lock_nested(inode1-i_mutex, I_MUTEX_CHILD);
 + lock_extent_range(inode2, loff2, len);
 + lock_extent_range(inode1, loff1, len);
 + }

You can decrease the code size by swapping just the pointers.

 +}
 +
 +static long btrfs_ioctl_file_extent_same(struct file *file,
 +  void __user *argp)
 +{
 + struct btrfs_ioctl_same_args *args;
 + struct btrfs_ioctl_same_args tmp;
 + struct btrfs_ioctl_same_extent_info *info;
 + struct inode *src = file-f_dentry-d_inode;
 + struct file *dst_file = NULL;
 + struct inode *dst;
 + u64 off;
 + u64 len;
 + int args_size;
 + int i;
 + int ret;
 + u64 bs = BTRFS_I(src)-root-fs_info-sb-s_blocksize;
 +
 + if (copy_from_user(tmp,
 +(struct btrfs_ioctl_same_args __user *)argp,
 +sizeof(tmp)))
 + return -EFAULT;
 +
 + args_size = sizeof(tmp) + (tmp.total_files *
 + sizeof(struct btrfs_ioctl_same_extent_info));
 +
 + /* Keep size of ioctl argument sane */
 + if (args_size  PAGE_CACHE_SIZE)
 + return -ENOMEM;

Using E2BIG7  /* Argument list too long */
makes more sense to me, it's not really an ENOMEM condition.

 +
 + args = kmalloc(args_size, GFP_NOFS);
 + if (!args)
 + return -ENOMEM;

(like here)

 +
 + ret = -EFAULT;
 + if (BTRFS_I(dst)-root != BTRFS_I(src)-root) {
 + printk(KERN_ERR btrfs: cannot dedup across subvolumes
 + %lld\n, info-fd);
 + goto next;
 + }
...
 + info-status = btrfs_extent_same(src, off, len, dst,
 +  info-logical_offset);
 + if (info-status == 0) {
 + info-bytes_deduped = len;
 + args-files_deduped++;
 + } else {
 + printk(KERN_ERR error %d from btrfs_extent_same\n,

missing btrfs: prefix

 + info-status);
 + }
 +next:

 --- a/fs/btrfs/ioctl.h
 +++ b/fs/btrfs/ioctl.h
 +/* For extent-same ioctl */
 +struct btrfs_ioctl_same_extent_info {
 + __s64 fd;   /* in - destination file */
 + __u64 logical_offset;   /* in - start of extent in destination */
 + __u64 bytes_deduped;/* out - total # of bytes we were able
 +  * to dedupe from this file */
 + /* status of this dedupe operation:
 +  * 0 if dedup succeeds
 +  *  0 for error
 +  * == BTRFS_SAME_DATA_DIFFERS if data differs
 +  */
 + __s32 status;   /* out - see above description */
 + __u32 reserved;
 +};
 +
 +struct btrfs_ioctl_same_args {
 + __u64 logical_offset;   /* in - start of extent in source */
 + __u64 length;   /* in - length of extent */
 + __u16 total_files;  /* in - total elements in info array */
 + __u16 files_deduped;/* out - number of files that got deduped */
 + __u32 reserved;

Please add a few more reserved bytes here, we may want to enhance the
call with some fine tunables or extended status. This is an external
interface, we don't need to count every byte here and makes minor future
enhancements easier.

 + struct btrfs_ioctl_same_extent_info info[0];
 +};
 +
  struct btrfs_ioctl_space_info {
   __u64 flags;
   __u64 total_bytes;
 @@ -498,5 +523,6 @@ struct btrfs_ioctl_send_args {
 struct btrfs_ioctl_get_dev_stats)
  #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
   struct btrfs_ioctl_dev_replace_args)
 -
 +#define BTRFS_IOC_FILE_EXTENT_SAME _IOWR(BTRFS_IOCTL_MAGIC, 54, \
 +  struct btrfs_ioctl_same_args)

Feel free to claim the ioctl number at

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Development_notes.2C_please_read
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] btrfs: Introduce extent_read_full_page_nolock()

2013-05-06 Thread David Sterba
On Tue, Apr 16, 2013 at 03:15:34PM -0700, Mark Fasheh wrote:
 @@ -2625,7 +2625,7 @@ static int __extent_read_full_page(struct 
 extent_io_tree *tree,
   }
  
   end = page_end;
 - while (1) {
 + while (1  !parent_locked) {

the patch is ok, just this caught my eye :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread james northrup
tried a git based backup? sounds spot-on as a compromise prior to
applying btrfs tweaks.  snapshotting the git binaries would have the
dedupe characteristics.

On Mon, May 6, 2013 at 12:44 AM, Kai Krakow hurikhan77+bt...@gmail.com wrote:
 Jan Schmidt list.bt...@jan-o-sch.net schrieb:

 I'm using an bash/rsync script[1] to backup my whole system on a nightly
 basis to an attached USB3 drive into a scratch area, then take a snapshot
 of this area. I'd like to have these snapshots immutable, so they should
 be read-only.

 Have you considered using btrfs send / receive for that purpose? You would
 just save the dedup step.

 This is planned for later. In the first step I want to stay as file system
 agnostic for the source as possible. But I've put it on my todo list in the
 gist.

 Regards,
 Kai

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: don't stop searching after encountering the wrong item

2013-05-06 Thread Gabriel de Perthuis
The search ioctl skips items that are too large for a result buffer, but
inline items of a certain size occuring before any search result is
found would trigger an overflow and stop the search entirely.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641

Signed-off-by: Gabriel de Perthuis g2p.code+bt...@gmail.com
---
 fs/btrfs/ioctl.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 95d46cc..b3f0276 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1797,23 +1797,23 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 
for (i = slot; i  nritems; i++) {
item_off = btrfs_item_ptr_offset(leaf, i);
item_len = btrfs_item_size_nr(leaf, i);
 
-   if (item_len  BTRFS_SEARCH_ARGS_BUFSIZE)
+   btrfs_item_key_to_cpu(leaf, key, i);
+   if (!key_in_sk(key, sk))
+   continue;
+
+   if (sizeof(sh) + item_len  BTRFS_SEARCH_ARGS_BUFSIZE)
item_len = 0;
 
if (sizeof(sh) + item_len + *sk_offset 
BTRFS_SEARCH_ARGS_BUFSIZE) {
ret = 1;
goto overflow;
}
 
-   btrfs_item_key_to_cpu(leaf, key, i);
-   if (!key_in_sk(key, sk))
-   continue;
-
sh.objectid = key-objectid;
sh.offset = key-offset;
sh.type = key-type;
sh.len = item_len;
sh.transid = found_transid;
-- 
1.8.2.1.419.ga0b97c6



Re: [PATCH] btrfs: don't stop searching after encountering the wrong item

2013-05-06 Thread Greg KH
On Mon, May 06, 2013 at 07:40:18PM +0200, Gabriel de Perthuis wrote:
 The search ioctl skips items that are too large for a result buffer, but
 inline items of a certain size occuring before any search result is
 found would trigger an overflow and stop the search entirely.
 
 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641
 
 Signed-off-by: Gabriel de Perthuis g2p.code+bt...@gmail.com
 ---
  fs/btrfs/ioctl.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

formletter

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

/formletter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs: wait for quota rescan to complete

2013-05-06 Thread Jan Schmidt
Two small patches, one for the kernel and one for the user mode. Both
required to support waiting for quota rescan to complete.

Jan Schmidt (1):
  Btrfs: add ioctl to wait for qgroup rescan completion

 fs/btrfs/ctree.h   |2 ++
 fs/btrfs/ioctl.c   |   12 
 fs/btrfs/qgroup.c  |   21 +
 include/uapi/linux/btrfs.h |1 +
 4 files changed, 36 insertions(+), 0 deletions(-)


Jan Schmidt (2):
  Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args
  Btrfs-progs: added btrfs quota rescan -w switch (wait)

 cmds-quota.c |   19 +--
 ioctl.h  |2 ++
 2 files changed, 19 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs-progs: added btrfs quota rescan -w switch (wait)

2013-05-06 Thread Jan Schmidt
With -w one can wait for a rescan operation to finish. It can be used when
starting a rescan operation or later to wait for the currently running
rescan operation to finish. Waiting is interruptible.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 cmds-quota.c |   19 +--
 ioctl.h  |1 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/cmds-quota.c b/cmds-quota.c
index 1169772..6557e83 100644
--- a/cmds-quota.c
+++ b/cmds-quota.c
@@ -90,10 +90,11 @@ static int cmd_quota_disable(int argc, char **argv)
 }
 
 static const char * const cmd_quota_rescan_usage[] = {
-   btrfs quota rescan [-s] path,
+   btrfs quota rescan [-sw] path,
Trash all qgroup numbers and scan the metadata again with the current 
config.,
,
-s   show status of a running rescan operation,
+   -w   wait for rescan operation to finish (can be already in progress),
NULL
 };
 
@@ -105,21 +106,30 @@ static int cmd_quota_rescan(int argc, char **argv)
char *path = NULL;
struct btrfs_ioctl_quota_rescan_args args;
int ioctlnum = BTRFS_IOC_QUOTA_RESCAN;
+   int wait_for_completion = 0;
 
optind = 1;
while (1) {
-   int c = getopt(argc, argv, s);
+   int c = getopt(argc, argv, sw);
if (c  0)
break;
switch (c) {
case 's':
ioctlnum = BTRFS_IOC_QUOTA_RESCAN_STATUS;
break;
+   case 'w':
+   wait_for_completion = 1;
+   break;
default:
usage(cmd_quota_rescan_usage);
}
}
 
+   if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN  wait_for_completion) {
+   fprintf(stderr, ERROR: -w cannot be used with -s\n);
+   return 12;
+   }
+
if (check_argc_exact(argc - optind, 1))
usage(cmd_quota_rescan_usage);
 
@@ -134,6 +144,11 @@ static int cmd_quota_rescan(int argc, char **argv)
 
ret = ioctl(fd, ioctlnum, args);
e = errno;
+
+   if (wait_for_completion  (ret == 0 || e == EINPROGRESS)) {
+   ret = ioctl(fd, BTRFS_IOC_QUOTA_RESCAN_WAIT, args);
+   e = errno;
+   }
close(fd);
 
if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN) {
diff --git a/ioctl.h b/ioctl.h
index abe6dd4..c260bbf 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -529,6 +529,7 @@ struct btrfs_ioctl_clone_range_args {
   struct btrfs_ioctl_quota_rescan_args)
 #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
   struct btrfs_ioctl_quota_rescan_args)
+#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
 #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
   char[BTRFS_LABEL_SIZE])
 #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args

2013-05-06 Thread Jan Schmidt
The patch set previously sent was sent together with the kernel part, but
was not updated as I added some reserved bytes to the ioctl struct for
future compatibility. This fixes struct btrfs_ioctl_quota_rescan_args.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 ioctl.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/ioctl.h b/ioctl.h
index 1ee631a..abe6dd4 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -429,6 +429,7 @@ struct btrfs_ioctl_quota_ctl_args {
 struct btrfs_ioctl_quota_rescan_args {
__u64   flags;
__u64   progress;
+   __u64   reserved[6];
 };
 
 struct btrfs_ioctl_qgroup_assign_args {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add ioctl to wait for qgroup rescan completion

2013-05-06 Thread Jan Schmidt
btrfs_qgroup_wait_for_completion waits until the currently running qgroup
operation completes. It returns immediately when no rescan process is in
progress. This is useful to automate things around the rescan process (e.g.
testing).

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.h   |2 ++
 fs/btrfs/ioctl.c   |   12 
 fs/btrfs/qgroup.c  |   21 +
 include/uapi/linux/btrfs.h |1 +
 4 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8624f49..39ca0d9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1607,6 +1607,7 @@ struct btrfs_fs_info {
struct mutex qgroup_rescan_lock; /* protects the progress item */
struct btrfs_key qgroup_rescan_progress;
struct btrfs_workers qgroup_rescan_workers;
+   struct completion qgroup_rescan_completion;
 
/* filesystem state */
unsigned long fs_state;
@@ -3836,6 +3837,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
 int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5e93bb8..9161660 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3937,6 +3937,16 @@ static long btrfs_ioctl_quota_rescan_status(struct file 
*file, void __user *arg)
return ret;
 }
 
+static long btrfs_ioctl_quota_rescan_wait(struct file *file, void __user *arg)
+{
+   struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   return btrfs_qgroup_wait_for_completion(root-fs_info);
+}
+
 static long btrfs_ioctl_set_received_subvol(struct file *file,
void __user *arg)
 {
@@ -4179,6 +4189,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_quota_rescan(file, argp);
case BTRFS_IOC_QUOTA_RESCAN_STATUS:
return btrfs_ioctl_quota_rescan_status(file, argp);
+   case BTRFS_IOC_QUOTA_RESCAN_WAIT:
+   return btrfs_ioctl_quota_rescan_wait(file, argp);
case BTRFS_IOC_DEV_REPLACE:
return btrfs_ioctl_dev_replace(root, argp);
case BTRFS_IOC_GET_FSLABEL:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9d49c58..ebca17a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2068,6 +2068,8 @@ out:
} else {
pr_err(btrfs: qgroup scan failed with %d\n, err);
}
+
+   complete_all(fs_info-qgroup_rescan_completion);
 }
 
 static void
@@ -2108,6 +2110,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
memset(fs_info-qgroup_rescan_progress, 0,
sizeof(fs_info-qgroup_rescan_progress));
+   init_completion(fs_info-qgroup_rescan_completion);
 
/* clear all current qgroup tracking information */
for (n = rb_first(fs_info-qgroup_tree); n; n = rb_next(n)) {
@@ -2124,3 +2127,21 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
return 0;
 }
+
+int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info)
+{
+   int running;
+   int ret = 0;
+
+   mutex_lock(fs_info-qgroup_rescan_lock);
+   spin_lock(fs_info-qgroup_lock);
+   running = fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+   spin_unlock(fs_info-qgroup_lock);
+   mutex_unlock(fs_info-qgroup_rescan_lock);
+
+   if (running)
+   ret = wait_for_completion_interruptible(
+   fs_info-qgroup_rescan_completion);
+
+   return ret;
+}
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ef0df5..5b683b5 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args {
   struct btrfs_ioctl_quota_rescan_args)
 #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
   struct btrfs_ioctl_quota_rescan_args)
+#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
 #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
   char[BTRFS_LABEL_SIZE])
 #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: device delete to get errors from the kernel

2013-05-06 Thread Josef Bacik
On Tue, Apr 30, 2013 at 07:19:40AM -0600, Anand Jain wrote:
 v1-v2:
 introduce error codes for the device mgmt usage
 
 v1:
 adds a parameter in the ioctl arg struct to carry the error string
 
 Signed-off-by: Anand Jain anand.j...@oracle.com
 ---

I need a proper log for this patch.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix off-by-one in fiemap

2013-05-06 Thread Josef Bacik
On Wed, May 01, 2013 at 10:23:41AM -0600, Liu Bo wrote:
 lock_extent/unlock_extent expect an exclusive end.
 

Can you make an xfstest for this so we can make sure we don't screw this up in
the future?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Kai Krakow
Jan Schmidt list.bt...@jan-o-sch.net schrieb:

 That one should be fixed in btrfs-next. If you can reliably reproduce the
 bug I'd be glad to get a confirmation - you can probably even save putting
 it on bugzilla then ;-)

I can reliably reproduce it from two different approaches. I'd like to only 
apply the commits fixing it. Can you name them here?

 4,1072,17508258745,-;[ cut here ]
 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144!
 4,1074,17508258791,-;invalid opcode:  [#1] SMP
 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O)
 vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib
 snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev
 coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core
 lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix
 4,1076,17508258966,-;CPU 0
 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G
 C O 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68
 Pro3
 4,1078,17508259023,-;RIP: 0010:[81161d12]  [81161d12]
 __tree_mod_log_rewind+0x4c/0x121
 4,1079,17508259064,-;RSP: 0018:8801966718e8  EFLAGS: 00010293
 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX:
 880196671888
 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI:
 8804087be700
 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09:
 880196671898
 4,1083,17508259165,-;R10:  R11:  R12:
 880406c2e000
 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15:
 0001
 4,1085,17508259218,-;FS:  ()
 GS:88041f20() knlGS:
 4,1086,17508259248,-;CS:  0010 DS:  ES:  CR0: 80050033
 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4:
 000407f0
 4,1088,17508259297,-;DR0:  DR1:  DR2:
 
 4,1089,17508259323,-;DR3:  DR6: 0ff0 DR7:
 0400
 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo
 88019667, task 8801b82e5400)
 4,1091,17508259383,-;Stack:
 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000
 8a11
 4,1093,17508259423,-; 8802d0a14000 81167606 0246
 8801ee8d33b0
 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360
 
 4,1095,17508259488,-;Call Trace:
 4,1096,17508259500,-; [81167606] ?
 btrfs_search_old_slot+0x543/0x61e
 4,1097,17508259526,-; [811692de] ?
 btrfs_next_old_leaf+0x8a/0x332 4,1098,17508259552,-; [811c484a]
 ? __resolve_indirect_refs+0x2d8/0x408
 4,1099,17508259578,-; [811c533b] ?
 find_parent_nodes+0x9c1/0xcec 4,1100,17508259602,-; [811c5e06]
 ? iterate_extent_inodes+0xf1/0x23c
 4,1101,17508259628,-; [811837b9] ?
 btrfs_real_readdir+0x482/0x482 4,1102,17508259652,-; [81194db7]
 ? release_extent_buffer.isra.19+0x27/0x88
 4,1103,17508259679,-; [811837b9] ?
 btrfs_real_readdir+0x482/0x482 4,1104,17508259703,-; [811c5fda]
 ? iterate_inodes_from_logical+0x89/0x96
 4,1105,17508259729,-; [811822fc] ?
 record_extent_backrefs+0x4d/0x8e
 4,1106,17508259755,-; [8118a8af] ?
 btrfs_finish_ordered_io+0x671/0x798
 4,1107,17508259781,-; [811a33cf] ? worker_loop+0x176/0x493
 4,1108,17508259803,-; [811a3259] ?
 btrfs_queue_worker+0x272/0x272 4,1109,17508259827,-; [811a3259]
 ? btrfs_queue_worker+0x272/0x272 4,1110,17508259852,-;
 [810496d2] ? kthread+0x81/0x89 4,,17508259873,-;
 [8105] ? free_sched_groups+0x32/0x50 4,1112,17508259896,-;
 [81049651] ? kthread_freezable_should_stop+0x36/0x36
 4,1113,17508259924,-; [8151c66c] ? ret_from_fork+0x7c/0xb0
 4,1114,17508259947,-; [81049651] ?
 kthread_freezable_should_stop+0x36/0x36
 4,1115,17508259974,-;Code: 85 e4 89 c5 0f 85 d6 00 00 00 e9 db 00 00 00
 41 83 7e 28 05 0f 87 ab 00 00 00 41 8b 46 28 ff 24 c5 20 78 62 81 41 39
 6e 2c 73 02 0f 0b 41 8b 56 2c 49 8d 76 38 48 89 df ff c5 e8 7c fb ff ff
 49
 1,1116,17508260117,-;RIP  [81161d12]
 __tree_mod_log_rewind+0x4c/0x121
 4,1117,17508260144,-; RSP 8801966718e8
 4,1118,17508446926,-;---[ end trace e7a8cddfc052e9e9 ]---


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-06 Thread Kai Krakow
zwu.ker...@gmail.com zwu.ker...@gmail.com schrieb:

   The patchset is trying to introduce hot relocation support
 for BTRFS. In hybrid storage environment, when the data in
 HDD disk get hot, it can be relocated to SSD disk by BTRFS
 hot relocation support automatically; also, if SSD disk ratio
 exceed its upper threshold, the data which get cold can be
 looked up and relocated to HDD disk to make more space in SSD
 disk at first, and then the data which get hot will be relocated
 to SSD disk automatically.

How will it compare to bcache? I'm currently thinking about buying an SSD 
but bcache requires some efforts in migrating the storage to use. And after 
all those hassles I am even not sure if it would work easily with a dracut 
generated initramfs.

Bcache seems to be quite clever with its approach. This one looks completely 
different and more targetted to relocate data which is used often instead of 
trying to reduce head movement. I'm quite happy with the throuput of my 3x 
HDD btrfs pool (according to bootchart up to 600 MB/s during boot). A single 
SSD would be slower since head movement seems not to be the issue during 
boot. Will this patch relocate such data? Or does it try to relocate only 
data which requires random head movement?

Thanks,
Kai

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread Kai Krakow
james northrup northrup.ja...@gmail.com schrieb:

 tried a git based backup? sounds spot-on as a compromise prior to
 applying btrfs tweaks.  snapshotting the git binaries would have the
 dedupe characteristics.

Git is efficient with space, yes. But if you have a lot of binary files, and 
a lot of them are big, git becomes really slow really fast. Checking out and 
in can be very slow and resource intensive then. And I don't think it would 
track ownership and permissions correctly.

Git is great, it's an everyday tool for me, but it is just not made for 
binary files.

Regards,
Kai

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: image: handle superblocks correctly on fs with big blocks

2013-05-06 Thread David Sterba
Superblock is always 4k, but metadata blocks may be larger. We have to
use the appropriate block size when doing checksums, otherwise they're
wrong.

Signed-off-by: David Sterba dste...@suse.cz
---
 btrfs-image.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/btrfs-image.c b/btrfs-image.c
index 188291c..dca7a28 100644
--- a/btrfs-image.c
+++ b/btrfs-image.c
@@ -469,6 +469,16 @@ static int read_data_extent(struct metadump_struct *md,
return 0;
 }
 
+static int is_sb_offset(u64 offset) {
+   switch (offset) {
+   case 65536:
+   case 67108864:
+   case 274877906944:
+   return 1;
+   }
+   return 0;
+}
+
 static int flush_pending(struct metadump_struct *md, int done)
 {
struct async_work *async = NULL;
@@ -506,7 +516,16 @@ static int flush_pending(struct metadump_struct *md, int 
done)
}
 
while (!md-data  size  0) {
-   eb = read_tree_block(md-root, start, blocksize, 0);
+   /*
+* We must differentiate between superblock and
+* metadata on filesystems with blocksize  4k,
+* otherwise the checksum fails for superblock
+*/
+   int bs = blocksize;
+
+   if (is_sb_offset(start))
+   bs = BTRFS_SUPER_INFO_SIZE;
+   eb = read_tree_block(md-root, start, bs, 0);
if (!eb) {
free(async-buffer);
free(async);
@@ -516,9 +535,9 @@ static int flush_pending(struct metadump_struct *md, int 
done)
}
copy_buffer(async-buffer + offset, eb);
free_extent_buffer(eb);
-   start += blocksize;
-   offset += blocksize;
-   size -= blocksize;
+   start += bs;
+   offset += bs;
+   size -= bs;
}
 
md-pending_start = (u64)-1;
-- 
1.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add ioctl to wait for qgroup rescan completion

2013-05-06 Thread David Sterba
On Mon, May 06, 2013 at 09:14:17PM +0200, Jan Schmidt wrote:
 --- a/include/uapi/linux/btrfs.h
 +++ b/include/uapi/linux/btrfs.h
 @@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args {
  struct btrfs_ioctl_quota_rescan_args)
  #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
  struct btrfs_ioctl_quota_rescan_args)
 +#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)

Why do you need an ioctl when the same can be achieved by polling the
RESCAN_STATUS value ? The code does not anything special that has to be
done within kernel.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: don't stop searching after encountering the wrong item

2013-05-06 Thread Gabriel de Perthuis
The search ioctl skips items that are too large for a result buffer, but
inline items of a certain size occuring before any search result is
found would trigger an overflow and stop the search entirely.

Cc: sta...@vger.kernel.org
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641

Signed-off-by: Gabriel de Perthuis g2p.code+bt...@gmail.com
---
(resent, with the correct header to have stable copied)

 fs/btrfs/ioctl.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2c02310..f49b62f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1794,23 +1794,23 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 
for (i = slot; i  nritems; i++) {
item_off = btrfs_item_ptr_offset(leaf, i);
item_len = btrfs_item_size_nr(leaf, i);
 
-   if (item_len  BTRFS_SEARCH_ARGS_BUFSIZE)
+   btrfs_item_key_to_cpu(leaf, key, i);
+   if (!key_in_sk(key, sk))
+   continue;
+
+   if (sizeof(sh) + item_len  BTRFS_SEARCH_ARGS_BUFSIZE)
item_len = 0;
 
if (sizeof(sh) + item_len + *sk_offset 
BTRFS_SEARCH_ARGS_BUFSIZE) {
ret = 1;
goto overflow;
}
 
-   btrfs_item_key_to_cpu(leaf, key, i);
-   if (!key_in_sk(key, sk))
-   continue;
-
sh.objectid = key-objectid;
sh.offset = key-offset;
sh.type = key-type;
sh.len = item_len;
sh.transid = found_transid;
-- 
1.8.2.1.419.ga0b97c6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: fix typecast when printing csum value

2013-05-06 Thread David Sterba
Only the first byte of the wanted csum is printed:

checksum verify failed on 65536 found DA97CF61 wanted 6B
checksum verify failed on 65536 found DA97CF61 wanted 6BC3870D

Also add leading zeros to the format.

Signed-off-by: David Sterba dste...@suse.cz
---
 disk-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index b001e35..21b410d 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -89,9 +89,9 @@ int csum_tree_block_size(struct extent_buffer *buf, u16 
csum_size,
 
if (verify) {
if (memcmp_extent_buffer(buf, result, 0, csum_size)) {
-   printk(checksum verify failed on %llu found %X 
-  wanted %X\n, (unsigned long long)buf-start,
-  *((int *)result), *((char *)buf-data));
+   printk(checksum verify failed on %llu found %08X 
+  wanted %08X\n, (unsigned long long)buf-start,
+  *((u32 *)result), *((u32*)(char *)buf-data));
free(result);
return 1;
}
-- 
1.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: clean snapshots one by one

2013-05-06 Thread Chris Mason
Quoting David Sterba (2013-03-12 11:13:28)
 Each time pick one dead root from the list and let the caller know if
 it's needed to continue. This should improve responsiveness during
 umount and balance which at some point waits for cleaning all currently
 queued dead roots.
 
 A new dead root is added to the end of the list, so the snapshots
 disappear in the order of deletion.
 
 The snapshot cleaning work is now done only from the cleaner thread and the
 others wake it if needed.


 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 988b860..4de2351 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1690,15 +1690,19 @@ static int cleaner_kthread(void *arg)
 struct btrfs_root *root = arg;
  
 do {
 +   int again = 0;
 +
 if (!(root-fs_info-sb-s_flags  MS_RDONLY) 
 +   down_read_trylock(root-fs_info-sb-s_umount) 
 mutex_trylock(root-fs_info-cleaner_mutex)) {
 btrfs_run_delayed_iputs(root);
 -   btrfs_clean_old_snapshots(root);
 +   again = btrfs_clean_one_deleted_snapshot(root);
 mutex_unlock(root-fs_info-cleaner_mutex);
 btrfs_run_defrag_inodes(root-fs_info);
 +   up_read(root-fs_info-sb-s_umount);

Can we use just the cleaner mutex for this?  We're deadlocking during
068 with autodefrag on because the cleaner is holding s_umount while
autodefrag is trying to bump the writer count.

If unmount takes the cleaner mutex once it should wait long enough for
the cleaner to stop.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-06 Thread Tomasz Torcz
On Mon, May 06, 2013 at 10:36:03PM +0200, Kai Krakow wrote:
 zwu.ker...@gmail.com zwu.ker...@gmail.com schrieb:
 
The patchset is trying to introduce hot relocation support
  for BTRFS. In hybrid storage environment, when the data in
  HDD disk get hot, it can be relocated to SSD disk by BTRFS
  hot relocation support automatically; also, if SSD disk ratio
  exceed its upper threshold, the data which get cold can be
  looked up and relocated to HDD disk to make more space in SSD
  disk at first, and then the data which get hot will be relocated
  to SSD disk automatically.
 
 How will it compare to bcache? I'm currently thinking about buying an SSD 
 but bcache requires some efforts in migrating the storage to use. And after 
 all those hassles I am even not sure if it would work easily with a dracut 
 generated initramfs.

  On the side note: dm-cache, which is already in-kernel, do not need to
reformat backing storage.

-- 
Tomasz TorczOnly gods can safely risk perfection,
xmpp: zdzich...@chrome.pl it's a dangerous thing for a man.  -- Alia

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html