Re: [RFC PATCH 0/3] apply the Probabilistic Skiplist on btrfs

2012-01-06 Thread Liu Bo
On 01/06/2012 07:51 AM, David Sterba wrote:
 Hi, I've let it run through xfstests and ended at 091, patches applied
 on top of 3.2, mount options
 compress-force=lzo,discard,inode_cache,space_cache,autodefrag
 fresh mkfs with defaults.
 


Hi David,

Thanks a lot for your work!

I also find this and fix it.

I will send V2 patchset after it goes through xfstests.

thanks,
liubo


 [ 1081.623819] btrfs: force lzo compression
 [ 1081.629166] btrfs: enabling inode map caching
 [ 1081.634853] btrfs: enabling auto defrag
 [ 1081.638569] btrfs: disk space caching is enabled
 [ 1119.693957] [ cut here ]
 [ 1119.697876] kernel BUG at fs/btrfs/file.c:530!
 [ 1119.697876] invalid opcode:  [#1] SMP
 [ 1119.697876] CPU 1
 [ 1119.697876] Modules linked in: loop btrfs aoe
 [ 1119.697876]
 [ 1119.697876] Pid: 25819, comm: fsx Not tainted 3.2.0-default+ #95 Intel 
 Corporation Santa Rosa platform/Matanzas
 [ 1119.697876] RIP: 0010:[a0048a18]  [a0048a18] 
 btrfs_drop_extent_cache+0x3f8/0x400 [btrfs]
 [ 1119.697876] RSP: 0018:88000c47f698  EFLAGS: 00010282
 [ 1119.697876] RAX: ffef RBX: 88006ff01e48 RCX: 
 00026fff
 [ 1119.697876] RDX: 88006ed5d830 RSI: 00022000 RDI: 
 
 [ 1119.697876] RBP: 88000c47f738 R08:  R09: 
 00022000
 [ 1119.697876] R10: fffe R11: 00026fff R12: 
 88001ada9e48
 [ 1119.697876] R13: 0001f000 R14:  R15: 
 88000c47f708
 [ 1119.697876] FS:  7f262e570700() GS:88007de0() 
 knlGS:
 [ 1119.697876] CS:  0010 DS:  ES:  CR0: 8005003b
 [ 1119.697876] CR2: 7fc4364fc000 CR3: 79435000 CR4: 
 06e0
 [ 1119.697876] DR0:  DR1:  DR2: 
 
 [ 1119.697876] DR3:  DR6: 0ff0 DR7: 
 0400
 [ 1119.697876] Process fsx (pid: 25819, threadinfo 88000c47e000, task 
 880063640700)
 [ 1119.697876] Stack:
 [ 1119.697876]  8800 81092040 88000c47f6f0 
 01000246
 [ 1119.697876]  0001  3000 
 88006e5c44f0
 [ 1119.697876]  88006e5c43e0   
 
 [ 1119.697876] Call Trace:
 [ 1119.697876]  [81092040] ? trace_hardirqs_on_caller+0x20/0x1d0
 [ 1119.697876]  [a003a0b0] ? csum_exist_in_range+0xa0/0xa0 [btrfs]
 [ 1119.697876]  [a003f296] cow_file_range+0x136/0x3e0 [btrfs]
 [ 1119.697876]  [810921fd] ? trace_hardirqs_on+0xd/0x10
 [ 1119.697876]  [a003f8a7] run_delalloc_nocow+0x367/0x820 [btrfs]
 [ 1119.697876]  [81357dae] ? do_raw_spin_unlock+0x5e/0xb0
 [ 1119.697876]  [a00400c9] run_delalloc_range+0x369/0x370 [btrfs]
 [ 1119.697876]  [a00582c0] __extent_writepage+0x5f0/0x750 [btrfs]
 [ 1119.697876]  [81349f4d] ? 
 radix_tree_gang_lookup_tag_slot+0x8d/0xd0
 [ 1119.697876]  [810f30d1] ? find_get_pages_tag+0x111/0x1b0
 [ 1119.697876]  [a0058692] 
 extent_write_cache_pages.clone.0+0x272/0x3f0 [btrfs]
 [ 1119.697876]  [81357dae] ? do_raw_spin_unlock+0x5e/0xb0
 [ 1119.697876]  [81131604] ? kfree+0xd4/0x180
 [ 1119.697876]  [81092040] ? trace_hardirqs_on_caller+0x20/0x1d0
 [ 1119.697876]  [a0058a56] extent_writepages+0x46/0x60 [btrfs]
 [ 1119.697876]  [a003b590] ? acls_after_inode_item+0xd0/0xd0 [btrfs]
 [ 1119.697876]  [a003ad17] btrfs_writepages+0x27/0x30 [btrfs]
 [ 1120.018734]  [810fdcc4] do_writepages+0x24/0x40
 [ 1120.018734]  [810f3cdb] __filemap_fdatawrite_range+0x5b/0x60
 [ 1120.018734]  [810f3d3a] filemap_write_and_wait_range+0x5a/0x80
 [ 1120.018734]  [a004859a] btrfs_file_aio_write+0x4da/0x560 [btrfs]
 [ 1120.018734]  [8113a852] do_sync_write+0xe2/0x120
 [ 1120.018734]  [8187d2ad] ? __mutex_unlock_slowpath+0xdd/0x180
 [ 1120.018734]  [8187d35e] ? mutex_unlock+0xe/0x10
 [ 1120.018734]  [a004703f] ? btrfs_file_llseek+0x6f/0x390 [btrfs]
 [ 1120.018734]  [8113b15e] vfs_write+0xce/0x190
 [ 1120.018734]  [8113b4a4] sys_write+0x54/0xa0
 [ 1120.018734]  [81887a82] system_call_fastpath+0x16/0x1b
 [ 1120.018734] Code: 5e 41 5f c9 c3 0f 0b be bf 01 00 00 48 c7 c7 e6 02 09 a0 
 48 89 95 68 ff ff ff e8 e4 a2 00 e1 48 8b 95 68 ff ff ff e9 3c fc ff ff 0f 
 0b 0f 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 53
 [ 1120.018734] RIP  [a0048a18] btrfs_drop_extent_cache+0x3f8/0x400 
 [btrfs]
 [ 1120.018734]  RSP 88000c47f698
 [ 1120.047841] ---[ end trace ca0f509767e0195d ]---
 
 xfstests/091 output:
 
 091 57s ... [19:47:50] [19:48:28] [failed, exit status 1] - output 
 mismatch (see 091.out.bad)
 --- 091.out 2011-11-01 10:31:12.0 +0100
 +++ 091.out.bad 2012-01-05 19:48:28.0 +0100
 @@ -5,3 +5,41 @@
  fsx -N 1 -o 8192 -l 50 -r 

Re: [PATCH 0/2] btrfs: allow cross-subvolume BTRFS_IOC_CLONE

2012-01-06 Thread Konstantinos Skarlatos

On 22/12/2011 2:24 μμ, Chris Samuel wrote:

Christoph,

On Sat, 2 Apr 2011 12:40:11 AM Chris Mason wrote:


Excerpts from Christoph Hellwig's message of 2011-04-01 09:34:05

-0400:



I don't think it's a good idea to introduce any user visible
operations over subvolume boundaries.  Currently we don't have
any operations over mount boundaries, which is pretty
fumdamental to the unix filesystem semantics.  If you want to
change this please come up with a clear description of the
semantics and post it to linux-fsdevel for discussion.  That of
course requires a clear description of the btrfs subvolumes,
which is still completely missing.


The subvolume is just a directory tree that can be snapshotted, and
has it's own private inode number space.

reflink across subvolumes is no different from copying a file from
one subvolume to another at the VFS level.  The src and
destination are different files and different inodes, they just
happen to share data extents.


Were Chris Mason's points above enough to sway your opposition to this
functionality/patch?

There is demand for the ability to move data between subvolumes
without needing to copy the extents themselves, it's cropped up again
on the list in recent days.

It seems a little hard (and counterintuitive) to enforce a wasteful
use of resources to copy data between different parts of the same
filesystem which happen to be a on a different subvolume when it's
permitted  functional to the same filesystem on the same subvolume.

I don't dispute the comment about documentation on subvolumes though,
there is a short discussion of them on the btrfs wiki in the sysadmins
guide, but not really a lot of detail. :-)

All the best,
Chris


Me too wants cp --reflink across subvolumes. Please make this feature 
available to us, as its a poor man's dedupe and would give big space 
savings for many use cases.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How long does it take to balance a 2x1TB RAID1 ?

2012-01-06 Thread Dirk Lutzebäck

Hi,

I have setup up a btrfs RAID1 using two 1TB drives. How long should a 
'btrfs filesystem balance' take? It is running now for more than 3 days 
on about 30% CPU and 40% wait state.


I am using stock btrfs from ubuntu 11.10 kernel 3.0.0

Regards

Dirk



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/21] Btrfs: get rid of *_alloc_profile fields

2012-01-06 Thread Ilya Dryomov
{data,metadata,system}_alloc_profile fields have been unused for a long
time now.  Get rid of them.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |3 ---
 fs/btrfs/disk-io.c |3 ---
 fs/btrfs/extent-tree.c |   10 --
 fs/btrfs/volumes.c |6 ++
 4 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6738503..f5434ad 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1135,9 +1135,6 @@ struct btrfs_fs_info {
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
-   u64 data_alloc_profile;
-   u64 metadata_alloc_profile;
-   u64 system_alloc_profile;
 
unsigned data_chunk_allocations;
unsigned metadata_ratio;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f9d555..ce9d0fb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2321,9 +2321,6 @@ retry_root_backup:
 
fs_info-generation = generation;
fs_info-last_trans_committed = generation;
-   fs_info-data_alloc_profile = (u64)-1;
-   fs_info-metadata_alloc_profile = (u64)-1;
-   fs_info-system_alloc_profile = fs_info-metadata_alloc_profile;
 
ret = btrfs_init_space_info(fs_info);
if (ret) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8603ee4..f0591fd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3067,14 +3067,12 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, 
u64 flags)
 static u64 get_alloc_profile(struct btrfs_root *root, u64 flags)
 {
if (flags  BTRFS_BLOCK_GROUP_DATA)
-   flags |= root-fs_info-avail_data_alloc_bits 
-root-fs_info-data_alloc_profile;
+   flags |= root-fs_info-avail_data_alloc_bits;
else if (flags  BTRFS_BLOCK_GROUP_SYSTEM)
-   flags |= root-fs_info-avail_system_alloc_bits 
-root-fs_info-system_alloc_profile;
+   flags |= root-fs_info-avail_system_alloc_bits;
else if (flags  BTRFS_BLOCK_GROUP_METADATA)
-   flags |= root-fs_info-avail_metadata_alloc_bits 
-root-fs_info-metadata_alloc_profile;
+   flags |= root-fs_info-avail_metadata_alloc_bits;
+
return btrfs_reduce_alloc_profile(root, flags);
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4b839f..89096f6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2752,8 +2752,7 @@ static noinline int init_first_rw_device(struct 
btrfs_trans_handle *trans,
return ret;
 
alloc_profile = BTRFS_BLOCK_GROUP_METADATA |
-   (fs_info-metadata_alloc_profile 
-fs_info-avail_metadata_alloc_bits);
+   fs_info-avail_metadata_alloc_bits;
alloc_profile = btrfs_reduce_alloc_profile(root, alloc_profile);
 
ret = __btrfs_alloc_chunk(trans, extent_root, map, chunk_size,
@@ -2763,8 +2762,7 @@ static noinline int init_first_rw_device(struct 
btrfs_trans_handle *trans,
sys_chunk_offset = chunk_offset + chunk_size;
 
alloc_profile = BTRFS_BLOCK_GROUP_SYSTEM |
-   (fs_info-system_alloc_profile 
-fs_info-avail_system_alloc_bits);
+   fs_info-avail_system_alloc_bits;
alloc_profile = btrfs_reduce_alloc_profile(root, alloc_profile);
 
ret = __btrfs_alloc_chunk(trans, extent_root, sys_map,
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/21] Btrfs: introduce masks for chunk type and profile

2012-01-06 Thread Ilya Dryomov
Chunk's type and profile are encoded in u64 flags field.  Introduce
masks to easily access them.  Also fix the type of BTRFS_BLOCK_GROUP_*
constants, it should be ULL.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |   26 +-
 fs/btrfs/extent-tree.c |   12 +++-
 fs/btrfs/volumes.c |   11 ++-
 3 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f5434ad..4370a56 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -751,15 +751,23 @@ struct btrfs_csum_item {
 } __attribute__ ((__packed__));
 
 /* different types of block groups (and chunks) */
-#define BTRFS_BLOCK_GROUP_DATA (1  0)
-#define BTRFS_BLOCK_GROUP_SYSTEM   (1  1)
-#define BTRFS_BLOCK_GROUP_METADATA (1  2)
-#define BTRFS_BLOCK_GROUP_RAID0(1  3)
-#define BTRFS_BLOCK_GROUP_RAID1(1  4)
-#define BTRFS_BLOCK_GROUP_DUP (1  5)
-#define BTRFS_BLOCK_GROUP_RAID10   (1  6)
-#define BTRFS_NR_RAID_TYPES   5
-
+#define BTRFS_BLOCK_GROUP_DATA (1ULL  0)
+#define BTRFS_BLOCK_GROUP_SYSTEM   (1ULL  1)
+#define BTRFS_BLOCK_GROUP_METADATA (1ULL  2)
+#define BTRFS_BLOCK_GROUP_RAID0(1ULL  3)
+#define BTRFS_BLOCK_GROUP_RAID1(1ULL  4)
+#define BTRFS_BLOCK_GROUP_DUP  (1ULL  5)
+#define BTRFS_BLOCK_GROUP_RAID10   (1ULL  6)
+#define BTRFS_NR_RAID_TYPES5
+
+#define BTRFS_BLOCK_GROUP_TYPE_MASK(BTRFS_BLOCK_GROUP_DATA |\
+BTRFS_BLOCK_GROUP_SYSTEM |  \
+BTRFS_BLOCK_GROUP_METADATA)
+
+#define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 |   \
+BTRFS_BLOCK_GROUP_RAID1 |   \
+BTRFS_BLOCK_GROUP_DUP | \
+BTRFS_BLOCK_GROUP_RAID10)
 struct btrfs_block_group_item {
__le64 used;
__le64 chunk_objectid;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f0591fd..a8d8204 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -618,8 +618,7 @@ static struct btrfs_space_info *__find_space_info(struct 
btrfs_fs_info *info,
struct list_head *head = info-space_info;
struct btrfs_space_info *found;
 
-   flags = BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_SYSTEM |
-BTRFS_BLOCK_GROUP_METADATA;
+   flags = BTRFS_BLOCK_GROUP_TYPE_MASK;
 
rcu_read_lock();
list_for_each_entry_rcu(found, head, list) {
@@ -2993,9 +2992,7 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
INIT_LIST_HEAD(found-block_groups[i]);
init_rwsem(found-groups_sem);
spin_lock_init(found-lock);
-   found-flags = flags  (BTRFS_BLOCK_GROUP_DATA |
-   BTRFS_BLOCK_GROUP_SYSTEM |
-   BTRFS_BLOCK_GROUP_METADATA);
+   found-flags = flags  BTRFS_BLOCK_GROUP_TYPE_MASK;
found-total_bytes = total_bytes;
found-disk_total = total_bytes * factor;
found-bytes_used = bytes_used;
@@ -3016,10 +3013,7 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
 
 static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
 {
-   u64 extra_flags = flags  (BTRFS_BLOCK_GROUP_RAID0 |
-  BTRFS_BLOCK_GROUP_RAID1 |
-  BTRFS_BLOCK_GROUP_RAID10 |
-  BTRFS_BLOCK_GROUP_DUP);
+   u64 extra_flags = flags  BTRFS_BLOCK_GROUP_PROFILE_MASK;
if (extra_flags) {
if (flags  BTRFS_BLOCK_GROUP_DATA)
fs_info-avail_data_alloc_bits |= extra_flags;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 89096f6..d5fdee5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2949,12 +2949,8 @@ again:
}
}
if (rw  REQ_DISCARD) {
-   if (map-type  (BTRFS_BLOCK_GROUP_RAID0 |
-BTRFS_BLOCK_GROUP_RAID1 |
-BTRFS_BLOCK_GROUP_DUP |
-BTRFS_BLOCK_GROUP_RAID10)) {
+   if (map-type  BTRFS_BLOCK_GROUP_PROFILE_MASK)
stripes_required = map-num_stripes;
-   }
}
if (bbio_ret  (rw  (REQ_WRITE | REQ_DISCARD)) 
stripes_allocated  stripes_required) {
@@ -2978,10 +2974,7 @@ again:
 
if (rw  REQ_DISCARD)
*length = min_t(u64, em-len - offset, *length);
-   else if (map-type  (BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_DUP)) {
+   else if (map-type  BTRFS_BLOCK_GROUP_PROFILE_MASK) {
/* we limit the length of each bio to what fits in a stripe */
  

[PATCH 03/21] Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit

2012-01-06 Thread Ilya Dryomov
Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
avail_{data,metadata,system}_alloc_bits fields, which gather info about
available allocation profiles in the FS.  When chunk is created or read
from disk, its profile is OR'ed with the corresponding avail_alloc_bits
field.  Since SINGLE is denoted by 0 in the on-disk format, currently
there is no way to tell when such chunks become avaialble.  Restriper
needs that information, so add a separate bit for SINGLE profile.

This bit is going to be in-memory only, it should never be written out
to disk, so it's not a disk format change.  However to avoid remappings
in future, reserve corresponding on-disk bit.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |   15 +++
 fs/btrfs/extent-tree.c |   30 +-
 2 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4370a56..3f8f11e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -758,6 +758,7 @@ struct btrfs_csum_item {
 #define BTRFS_BLOCK_GROUP_RAID1(1ULL  4)
 #define BTRFS_BLOCK_GROUP_DUP  (1ULL  5)
 #define BTRFS_BLOCK_GROUP_RAID10   (1ULL  6)
+#define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE
 #define BTRFS_NR_RAID_TYPES5
 
 #define BTRFS_BLOCK_GROUP_TYPE_MASK(BTRFS_BLOCK_GROUP_DATA |\
@@ -768,6 +769,15 @@ struct btrfs_csum_item {
 BTRFS_BLOCK_GROUP_RAID1 |   \
 BTRFS_BLOCK_GROUP_DUP | \
 BTRFS_BLOCK_GROUP_RAID10)
+/*
+ * We need a bit for restriper to be able to tell when chunks of type
+ * SINGLE are available.  This extended profile format is used in
+ * fs_info-avail_*_alloc_bits (in-memory) and balance item fields
+ * (on-disk).  The corresponding on-disk bit in chunk.type is reserved
+ * to avoid remappings between two formats in future.
+ */
+#define BTRFS_AVAIL_ALLOC_BIT_SINGLE   (1ULL  48)
+
 struct btrfs_block_group_item {
__le64 used;
__le64 chunk_objectid;
@@ -1140,6 +1150,11 @@ struct btrfs_fs_info {
spinlock_t ref_cache_lock;
u64 total_ref_cache_size;
 
+   /*
+* these three are in extended format (availability of single
+* chunks is denoted by BTRFS_AVAIL_ALLOC_BIT_SINGLE bit, other
+* types are denoted by corresponding BTRFS_BLOCK_GROUP_* bits)
+*/
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a8d8204..15a2294 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3014,16 +3014,24 @@ static int update_space_info(struct btrfs_fs_info 
*info, u64 flags,
 static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
 {
u64 extra_flags = flags  BTRFS_BLOCK_GROUP_PROFILE_MASK;
-   if (extra_flags) {
-   if (flags  BTRFS_BLOCK_GROUP_DATA)
-   fs_info-avail_data_alloc_bits |= extra_flags;
-   if (flags  BTRFS_BLOCK_GROUP_METADATA)
-   fs_info-avail_metadata_alloc_bits |= extra_flags;
-   if (flags  BTRFS_BLOCK_GROUP_SYSTEM)
-   fs_info-avail_system_alloc_bits |= extra_flags;
-   }
+
+   /* chunk - extended profile */
+   if (extra_flags == 0)
+   extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+   if (flags  BTRFS_BLOCK_GROUP_DATA)
+   fs_info-avail_data_alloc_bits |= extra_flags;
+   if (flags  BTRFS_BLOCK_GROUP_METADATA)
+   fs_info-avail_metadata_alloc_bits |= extra_flags;
+   if (flags  BTRFS_BLOCK_GROUP_SYSTEM)
+   fs_info-avail_system_alloc_bits |= extra_flags;
 }
 
+/*
+ * @flags: available profiles in extended format (see ctree.h)
+ *
+ * Returns reduced profile in chunk format.
+ */
 u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
 {
/*
@@ -3053,8 +3061,12 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, 
u64 flags)
if ((flags  BTRFS_BLOCK_GROUP_RAID0) 
((flags  BTRFS_BLOCK_GROUP_RAID1) |
 (flags  BTRFS_BLOCK_GROUP_RAID10) |
-(flags  BTRFS_BLOCK_GROUP_DUP)))
+(flags  BTRFS_BLOCK_GROUP_DUP))) {
flags = ~BTRFS_BLOCK_GROUP_RAID0;
+   }
+
+   /* extended - chunk profile */
+   flags = ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
return flags;
 }
 
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/21] Btrfs: make avail_*_alloc_bits fields dynamic

2012-01-06 Thread Ilya Dryomov
Currently when new chunks are created respective avail_alloc_bits field
is updated to reflect profiles of all chunks present in the system.
However when chunks are removed profile bits are never cleared.

This patch clears profile bit of respective avail_alloc_bits field when
the last chunk with that profile is removed.  Restriper needs this to
properly operate when downgrading.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/extent-tree.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 15a2294..946b067 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7469,6 +7469,22 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
return 0;
 }
 
+static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
+{
+   u64 extra_flags = flags  BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+   /* chunk - extended profile */
+   if (extra_flags == 0)
+   extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+   if (flags  BTRFS_BLOCK_GROUP_DATA)
+   fs_info-avail_data_alloc_bits = ~extra_flags;
+   if (flags  BTRFS_BLOCK_GROUP_METADATA)
+   fs_info-avail_metadata_alloc_bits = ~extra_flags;
+   if (flags  BTRFS_BLOCK_GROUP_SYSTEM)
+   fs_info-avail_system_alloc_bits = ~extra_flags;
+}
+
 int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 struct btrfs_root *root, u64 group_start)
 {
@@ -7479,6 +7495,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
struct btrfs_key key;
struct inode *inode;
int ret;
+   int index;
int factor;
 
root = root-fs_info-extent_root;
@@ -7494,6 +7511,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
free_excluded_extents(root, block_group);
 
memcpy(key, block_group-key, sizeof(key));
+   index = get_block_group_index(block_group);
if (block_group-flags  (BTRFS_BLOCK_GROUP_DUP |
  BTRFS_BLOCK_GROUP_RAID1 |
  BTRFS_BLOCK_GROUP_RAID10))
@@ -7568,6 +7586,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 * are still on the list after taking the semaphore
 */
list_del_init(block_group-list);
+   if (list_empty(block_group-space_info-block_groups[index]))
+   clear_avail_alloc_bits(root-fs_info, block_group-flags);
up_write(block_group-space_info-groups_sem);
 
if (block_group-cached == BTRFS_CACHE_STARTED)
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/21] Btrfs: add basic infrastructure for selective balancing

2012-01-06 Thread Ilya Dryomov
This allows to have a separate set of filters for each chunk type
(data,meta,sys).  The code however is generic and switch on chunk type
is only done once.

This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c   |3 ++
 fs/btrfs/volumes.c |   58 ++-
 fs/btrfs/volumes.h |   11 +
 3 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 221bae0..e20d0cb 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3101,6 +3101,9 @@ static long btrfs_ioctl_balance(struct btrfs_root *root, 
void __user *arg)
memcpy(bctl-sys, bargs-sys, sizeof(bctl-sys));
 
bctl-flags = bargs-flags;
+   } else {
+   /* balance everything - no filters */
+   bctl-flags |= BTRFS_BALANCE_TYPE_MASK;
}
 
ret = btrfs_balance(bctl, 0);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e8d6e78..0e6cddd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2102,6 +2102,30 @@ static void unset_balance_control(struct btrfs_fs_info 
*fs_info)
kfree(bctl);
 }
 
+static int should_balance_chunk(struct btrfs_root *root,
+   struct extent_buffer *leaf,
+   struct btrfs_chunk *chunk, u64 chunk_offset)
+{
+   struct btrfs_balance_control *bctl = root-fs_info-balance_ctl;
+   struct btrfs_balance_args *bargs = NULL;
+   u64 chunk_type = btrfs_chunk_type(leaf, chunk);
+
+   /* type filter */
+   if (!((chunk_type  BTRFS_BLOCK_GROUP_TYPE_MASK) 
+ (bctl-flags  BTRFS_BALANCE_TYPE_MASK))) {
+   return 0;
+   }
+
+   if (chunk_type  BTRFS_BLOCK_GROUP_DATA)
+   bargs = bctl-data;
+   else if (chunk_type  BTRFS_BLOCK_GROUP_SYSTEM)
+   bargs = bctl-sys;
+   else if (chunk_type  BTRFS_BLOCK_GROUP_METADATA)
+   bargs = bctl-meta;
+
+   return 1;
+}
+
 static u64 div_factor(u64 num, int factor)
 {
if (factor == 10)
@@ -2119,10 +2143,13 @@ static int __btrfs_balance(struct btrfs_fs_info 
*fs_info)
struct btrfs_device *device;
u64 old_size;
u64 size_to_free;
+   struct btrfs_chunk *chunk;
struct btrfs_path *path;
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_trans_handle *trans;
+   struct extent_buffer *leaf;
+   int slot;
int ret;
int enospc_errors = 0;
 
@@ -2177,8 +2204,10 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
if (ret)
BUG(); /* FIXME break ? */
 
-   btrfs_item_key_to_cpu(path-nodes[0], found_key,
- path-slots[0]);
+   leaf = path-nodes[0];
+   slot = path-slots[0];
+   btrfs_item_key_to_cpu(leaf, found_key, slot);
+
if (found_key.objectid != key.objectid)
break;
 
@@ -2186,7 +2215,14 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
if (found_key.offset == 0)
break;
 
+   chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
+
+   ret = should_balance_chunk(chunk_root, leaf, chunk,
+  found_key.offset);
btrfs_release_path(path);
+   if (!ret)
+   goto loop;
+
ret = btrfs_relocate_chunk(chunk_root,
   chunk_root-root_key.objectid,
   found_key.objectid,
@@ -2195,6 +2231,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
goto error;
if (ret == -ENOSPC)
enospc_errors++;
+loop:
key.offset = found_key.offset - 1;
}
 
@@ -2221,11 +2258,28 @@ static void __cancel_balance(struct btrfs_fs_info 
*fs_info)
 int btrfs_balance(struct btrfs_balance_control *bctl, int resume)
 {
struct btrfs_fs_info *fs_info = bctl-fs_info;
+   u64 allowed;
int ret;
 
if (btrfs_fs_closing(fs_info)) {
ret = -EINVAL;
goto out;
+
+   /*
+* In case of mixed groups both data and meta should be picked,
+* and identical options should be given for both of them.
+*/
+   allowed = btrfs_super_incompat_flags(fs_info-super_copy);
+   if ((allowed  BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) 
+   (bctl-flags  (BTRFS_BALANCE_DATA | BTRFS_BALANCE_METADATA))) {
+   if (!(bctl-flags  BTRFS_BALANCE_DATA) ||
+   !(bctl-flags  BTRFS_BALANCE_METADATA) ||
+   memcmp(bctl-data, bctl-meta, sizeof(bctl-data))) {
+   

[PATCH 07/21] Btrfs: profiles filter

2012-01-06 Thread Ilya Dryomov
Select chunks based on a given profile mask.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   24 
 fs/btrfs/volumes.h |4 
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0e6cddd..315a6c2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2102,6 +2102,24 @@ static void unset_balance_control(struct btrfs_fs_info 
*fs_info)
kfree(bctl);
 }
 
+/*
+ * Balance filters.  Return 1 if chunk should be filtered out
+ * (should not be balanced).
+ */
+static int chunk_profiles_filter(u64 chunk_profile,
+struct btrfs_balance_args *bargs)
+{
+   chunk_profile = BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+   if (chunk_profile == 0)
+   chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+   if (bargs-profiles  chunk_profile)
+   return 0;
+
+   return 1;
+}
+
 static int should_balance_chunk(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_chunk *chunk, u64 chunk_offset)
@@ -2123,6 +2141,12 @@ static int should_balance_chunk(struct btrfs_root *root,
else if (chunk_type  BTRFS_BLOCK_GROUP_METADATA)
bargs = bctl-meta;
 
+   /* profiles filter */
+   if ((bargs-flags  BTRFS_BALANCE_ARGS_PROFILES) 
+   chunk_profiles_filter(chunk_type, bargs)) {
+   return 0;
+   }
+
return 1;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 7a78051..9c95c13 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -196,6 +196,10 @@ struct map_lookup {
 #define BTRFS_BALANCE_TYPE_MASK(BTRFS_BALANCE_DATA |   \
 BTRFS_BALANCE_SYSTEM | \
 BTRFS_BALANCE_METADATA)
+/*
+ * Balance filters
+ */
+#define BTRFS_BALANCE_ARGS_PROFILES(1ULL  0)
 
 struct btrfs_balance_args;
 struct btrfs_balance_control {
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/21] Btrfs: usage filter

2012-01-06 Thread Ilya Dryomov
Select chunks that are less than X percent full.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   36 
 fs/btrfs/volumes.h |1 +
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 315a6c2..e252cf2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2120,6 +2120,36 @@ static int chunk_profiles_filter(u64 chunk_profile,
return 1;
 }
 
+static u64 div_factor_fine(u64 num, int factor)
+{
+   if (factor = 0)
+   return 0;
+   if (factor = 100)
+   return num;
+
+   num *= factor;
+   do_div(num, 100);
+   return num;
+}
+
+static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset,
+ struct btrfs_balance_args *bargs)
+{
+   struct btrfs_block_group_cache *cache;
+   u64 chunk_used, user_thresh;
+   int ret = 1;
+
+   cache = btrfs_lookup_block_group(fs_info, chunk_offset);
+   chunk_used = btrfs_block_group_used(cache-item);
+
+   user_thresh = div_factor_fine(cache-key.offset, bargs-usage);
+   if (chunk_used  user_thresh)
+   ret = 0;
+
+   btrfs_put_block_group(cache);
+   return ret;
+}
+
 static int should_balance_chunk(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_chunk *chunk, u64 chunk_offset)
@@ -2147,6 +2177,12 @@ static int should_balance_chunk(struct btrfs_root *root,
return 0;
}
 
+   /* usage filter */
+   if ((bargs-flags  BTRFS_BALANCE_ARGS_USAGE) 
+   chunk_usage_filter(bctl-fs_info, chunk_offset, bargs)) {
+   return 0;
+   }
+
return 1;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9c95c13..23b53bb 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -200,6 +200,7 @@ struct map_lookup {
  * Balance filters
  */
 #define BTRFS_BALANCE_ARGS_PROFILES(1ULL  0)
+#define BTRFS_BALANCE_ARGS_USAGE   (1ULL  1)
 
 struct btrfs_balance_args;
 struct btrfs_balance_control {
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/21] Btrfs: add basic restriper infrastructure

2012-01-06 Thread Ilya Dryomov
Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc.  The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |6 +++
 fs/btrfs/disk-io.c |4 ++
 fs/btrfs/ioctl.c   |  114 ---
 fs/btrfs/ioctl.h   |   43 +++
 fs/btrfs/volumes.c |  113 ---
 fs/btrfs/volumes.h |   13 +-
 6 files changed, 252 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3f8f11e..c4d98c8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -934,6 +934,7 @@ struct btrfs_block_group_cache {
 struct reloc_control;
 struct btrfs_device;
 struct btrfs_fs_devices;
+struct btrfs_balance_control;
 struct btrfs_delayed_root;
 struct btrfs_fs_info {
u8 fsid[BTRFS_FSID_SIZE];
@@ -1159,6 +1160,11 @@ struct btrfs_fs_info {
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
 
+   /* restriper state */
+   spinlock_t balance_lock;
+   struct mutex balance_mutex;
+   struct btrfs_balance_control *balance_ctl;
+
unsigned data_chunk_allocations;
unsigned metadata_ratio;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ce9d0fb..190a1b2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2002,6 +2002,10 @@ struct btrfs_root *open_ctree(struct super_block *sb,
init_rwsem(fs_info-scrub_super_lock);
fs_info-scrub_workers_refcnt = 0;
 
+   spin_lock_init(fs_info-balance_lock);
+   mutex_init(fs_info-balance_mutex);
+   fs_info-balance_ctl = NULL;
+
sb-s_blocksize = 4096;
sb-s_blocksize_bits = blksize_bits(4096);
sb-s_bdi = fs_info-bdi;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c04f02c..221bae0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1203,13 +1203,21 @@ static noinline int btrfs_ioctl_resize(struct 
btrfs_root *root,
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
+   mutex_lock(root-fs_info-volume_mutex);
+   if (root-fs_info-balance_ctl) {
+   printk(KERN_INFO btrfs: balance in progress\n);
+   ret = -EINVAL;
+   goto out;
+   }
+
vol_args = memdup_user(arg, sizeof(*vol_args));
-   if (IS_ERR(vol_args))
-   return PTR_ERR(vol_args);
+   if (IS_ERR(vol_args)) {
+   ret = PTR_ERR(vol_args);
+   goto out;
+   }
 
vol_args-name[BTRFS_PATH_NAME_MAX] = '\0';
 
-   mutex_lock(root-fs_info-volume_mutex);
sizestr = vol_args-name;
devstr = strchr(sizestr, ':');
if (devstr) {
@@ -1226,7 +1234,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
printk(KERN_INFO btrfs: resizer unable to find device %llu\n,
   (unsigned long long)devid);
ret = -EINVAL;
-   goto out_unlock;
+   goto out_free;
}
if (!strcmp(sizestr, max))
new_size = device-bdev-bd_inode-i_size;
@@ -1241,7 +1249,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
new_size = memparse(sizestr, NULL);
if (new_size == 0) {
ret = -EINVAL;
-   goto out_unlock;
+   goto out_free;
}
}
 
@@ -1250,7 +1258,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
if (mod  0) {
if (new_size  old_size) {
ret = -EINVAL;
-   goto out_unlock;
+   goto out_free;
}
new_size = old_size - new_size;
} else if (mod  0) {
@@ -1259,11 +1267,11 @@ static noinline int btrfs_ioctl_resize(struct 
btrfs_root *root,
 
if (new_size  256 * 1024 * 1024) {
ret = -EINVAL;
-   goto out_unlock;
+   goto out_free;
}
if (new_size  device-bdev-bd_inode-i_size) {
ret = -EFBIG;
-   goto out_unlock;
+   goto out_free;
}
 
do_div(new_size, root-sectorsize);
@@ -1276,7 +1284,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
-   goto out_unlock;
+   goto out_free;
}
ret = btrfs_grow_device(trans, device, new_size);
btrfs_commit_transaction(trans, root);
@@ -1284,9 +1292,10 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,

[PATCH 09/21] Btrfs: devid filter

2012-01-06 Thread Ilya Dryomov
Relocate chunks which have at least one stripe located on a device with
devid X.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   23 +++
 fs/btrfs/volumes.h |1 +
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e252cf2..d52cfde 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2150,6 +2150,23 @@ static int chunk_usage_filter(struct btrfs_fs_info 
*fs_info, u64 chunk_offset,
return ret;
 }
 
+static int chunk_devid_filter(struct extent_buffer *leaf,
+ struct btrfs_chunk *chunk,
+ struct btrfs_balance_args *bargs)
+{
+   struct btrfs_stripe *stripe;
+   int num_stripes = btrfs_chunk_num_stripes(leaf, chunk);
+   int i;
+
+   for (i = 0; i  num_stripes; i++) {
+   stripe = btrfs_stripe_nr(chunk, i);
+   if (btrfs_stripe_devid(leaf, stripe) == bargs-devid)
+   return 0;
+   }
+
+   return 1;
+}
+
 static int should_balance_chunk(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_chunk *chunk, u64 chunk_offset)
@@ -2183,6 +2200,12 @@ static int should_balance_chunk(struct btrfs_root *root,
return 0;
}
 
+   /* devid filter */
+   if ((bargs-flags  BTRFS_BALANCE_ARGS_DEVID) 
+   chunk_devid_filter(leaf, chunk, bargs)) {
+   return 0;
+   }
+
return 1;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 23b53bb..7007a31 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -201,6 +201,7 @@ struct map_lookup {
  */
 #define BTRFS_BALANCE_ARGS_PROFILES(1ULL  0)
 #define BTRFS_BALANCE_ARGS_USAGE   (1ULL  1)
+#define BTRFS_BALANCE_ARGS_DEVID   (1ULL  2)
 
 struct btrfs_balance_args;
 struct btrfs_balance_control {
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/21] Btrfs: devid subset filter

2012-01-06 Thread Ilya Dryomov
Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.

This filter only works when devid filter is turned on.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   45 +
 fs/btrfs/volumes.h |1 +
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d52cfde..a5a2d65 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2167,6 +2167,45 @@ static int chunk_devid_filter(struct extent_buffer *leaf,
return 1;
 }
 
+/* [pstart, pend) */
+static int chunk_drange_filter(struct extent_buffer *leaf,
+  struct btrfs_chunk *chunk,
+  u64 chunk_offset,
+  struct btrfs_balance_args *bargs)
+{
+   struct btrfs_stripe *stripe;
+   int num_stripes = btrfs_chunk_num_stripes(leaf, chunk);
+   u64 stripe_offset;
+   u64 stripe_length;
+   int factor;
+   int i;
+
+   BUG_ON(!(bargs-flags  BTRFS_BALANCE_ARGS_DEVID));
+
+   if (btrfs_chunk_type(leaf, chunk)  (BTRFS_BLOCK_GROUP_DUP |
+BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10))
+   factor = 2;
+   else
+   factor = 1;
+   factor = num_stripes / factor;
+
+   for (i = 0; i  num_stripes; i++) {
+   stripe = btrfs_stripe_nr(chunk, i);
+   if (btrfs_stripe_devid(leaf, stripe) != bargs-devid)
+   continue;
+
+   stripe_offset = btrfs_stripe_offset(leaf, stripe);
+   stripe_length = btrfs_chunk_length(leaf, chunk);
+   do_div(stripe_length, factor);
+
+   if (stripe_offset  bargs-pend 
+   stripe_offset + stripe_length  bargs-pstart)
+   return 0;
+   }
+
+   return 1;
+}
+
 static int should_balance_chunk(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_chunk *chunk, u64 chunk_offset)
@@ -2206,6 +2245,12 @@ static int should_balance_chunk(struct btrfs_root *root,
return 0;
}
 
+   /* drange filter, makes sense only with devid filter */
+   if ((bargs-flags  BTRFS_BALANCE_ARGS_DRANGE) 
+   chunk_drange_filter(leaf, chunk, chunk_offset, bargs)) {
+   return 0;
+   }
+
return 1;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 7007a31..75fb5f2 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -202,6 +202,7 @@ struct map_lookup {
 #define BTRFS_BALANCE_ARGS_PROFILES(1ULL  0)
 #define BTRFS_BALANCE_ARGS_USAGE   (1ULL  1)
 #define BTRFS_BALANCE_ARGS_DEVID   (1ULL  2)
+#define BTRFS_BALANCE_ARGS_DRANGE  (1ULL  3)
 
 struct btrfs_balance_args;
 struct btrfs_balance_control {
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/21] Btrfs: virtual address space subset filter

2012-01-06 Thread Ilya Dryomov
Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   20 
 fs/btrfs/volumes.h |1 +
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a5a2d65..719e61d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2206,6 +2206,20 @@ static int chunk_drange_filter(struct extent_buffer 
*leaf,
return 1;
 }
 
+/* [vstart, vend) */
+static int chunk_vrange_filter(struct extent_buffer *leaf,
+  struct btrfs_chunk *chunk,
+  u64 chunk_offset,
+  struct btrfs_balance_args *bargs)
+{
+   if (chunk_offset  bargs-vend 
+   chunk_offset + btrfs_chunk_length(leaf, chunk)  bargs-vstart)
+   /* at least part of the chunk is inside this vrange */
+   return 0;
+
+   return 1;
+}
+
 static int should_balance_chunk(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_chunk *chunk, u64 chunk_offset)
@@ -2251,6 +2265,12 @@ static int should_balance_chunk(struct btrfs_root *root,
return 0;
}
 
+   /* vrange filter */
+   if ((bargs-flags  BTRFS_BALANCE_ARGS_VRANGE) 
+   chunk_vrange_filter(leaf, chunk, chunk_offset, bargs)) {
+   return 0;
+   }
+
return 1;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 75fb5f2..879afcb 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -203,6 +203,7 @@ struct map_lookup {
 #define BTRFS_BALANCE_ARGS_USAGE   (1ULL  1)
 #define BTRFS_BALANCE_ARGS_DEVID   (1ULL  2)
 #define BTRFS_BALANCE_ARGS_DRANGE  (1ULL  3)
+#define BTRFS_BALANCE_ARGS_VRANGE  (1ULL  4)
 
 struct btrfs_balance_args;
 struct btrfs_balance_control {
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/21] Btrfs: do not reduce profile in do_chunk_alloc()

2012-01-06 Thread Ilya Dryomov
Every caller of do_chunk_alloc() feeds it the reduced allocation
profile, so stop trying to reduce it one more time.  Instead check the
validity of the passed profile.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |   18 ++
 fs/btrfs/extent-tree.c |2 +-
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c4d98c8..1e7aea6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2536,6 +2536,24 @@ static inline void free_fs_info(struct btrfs_fs_info 
*fs_info)
kfree(fs_info-super_for_commit);
kfree(fs_info);
 }
+/**
+ * profile_is_valid - tests whether a given profile is valid and reduced
+ * @flags: profile to validate
+ * @extended: if true @flags is treated as an extended profile
+ */
+static inline int profile_is_valid(u64 flags, int extended)
+{
+   u64 mask = ~BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+   flags = ~BTRFS_BLOCK_GROUP_TYPE_MASK;
+   if (extended)
+   mask = ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+   if (flags  mask)
+   return 0;
+   /* true if zero or exactly one bit set */
+   return (flags  (~flags + 1)) == flags;
+}
 
 /* root-item.c */
 int btrfs_find_root_ref(struct btrfs_root *tree_root,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 946b067..a1a18ea 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3295,7 +3295,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
int wait_for_alloc = 0;
int ret = 0;
 
-   flags = btrfs_reduce_alloc_profile(extent_root, flags);
+   BUG_ON(!profile_is_valid(flags, 0));
 
space_info = __find_space_info(extent_root-fs_info, flags);
if (!space_info) {
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/21] Btrfs: implement online profile changing

2012-01-06 Thread Ilya Dryomov
Profile changing is done by launching a balance with
BTRFS_BALANCE_CONVERT bits set and target fields of respective
btrfs_balance_args structs initialized.  Profile reducing code in this
case will pick restriper's target profile if it's available instead of
doing a blind reduce.  If target profile is not yet available it goes
back to a plain reduce.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/extent-tree.c |   56 ++-
 fs/btrfs/volumes.c |   69 
 fs/btrfs/volumes.h |5 +++
 3 files changed, 129 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a1a18ea..e6a832e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3030,7 +3030,9 @@ static void set_avail_alloc_bits(struct btrfs_fs_info 
*fs_info, u64 flags)
 /*
  * @flags: available profiles in extended format (see ctree.h)
  *
- * Returns reduced profile in chunk format.
+ * Returns reduced profile in chunk format.  If profile changing is in
+ * progress (either running or paused) picks the target profile (if it's
+ * already available), otherwise falls back to plain reducing.
  */
 u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
 {
@@ -3042,6 +3044,34 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, 
u64 flags)
u64 num_devices = root-fs_info-fs_devices-rw_devices +
root-fs_info-fs_devices-missing_devices;
 
+   /* pick restriper's target profile if it's available */
+   spin_lock(root-fs_info-balance_lock);
+   if (root-fs_info-balance_ctl) {
+   struct btrfs_balance_control *bctl = root-fs_info-balance_ctl;
+   u64 tgt = 0;
+
+   if ((flags  BTRFS_BLOCK_GROUP_DATA) 
+   (bctl-data.flags  BTRFS_BALANCE_ARGS_CONVERT) 
+   (flags  bctl-data.target)) {
+   tgt = BTRFS_BLOCK_GROUP_DATA | bctl-data.target;
+   } else if ((flags  BTRFS_BLOCK_GROUP_SYSTEM) 
+  (bctl-sys.flags  BTRFS_BALANCE_ARGS_CONVERT) 
+  (flags  bctl-sys.target)) {
+   tgt = BTRFS_BLOCK_GROUP_SYSTEM | bctl-sys.target;
+   } else if ((flags  BTRFS_BLOCK_GROUP_METADATA) 
+  (bctl-meta.flags  BTRFS_BALANCE_ARGS_CONVERT) 
+  (flags  bctl-meta.target)) {
+   tgt = BTRFS_BLOCK_GROUP_METADATA | bctl-meta.target;
+   }
+
+   if (tgt) {
+   spin_unlock(root-fs_info-balance_lock);
+   flags = tgt;
+   goto out;
+   }
+   }
+   spin_unlock(root-fs_info-balance_lock);
+
if (num_devices == 1)
flags = ~(BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID0);
if (num_devices  4)
@@ -3065,6 +3095,7 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, 
u64 flags)
flags = ~BTRFS_BLOCK_GROUP_RAID0;
}
 
+out:
/* extended - chunk profile */
flags = ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
return flags;
@@ -6795,6 +6826,29 @@ static u64 update_block_group_flags(struct btrfs_root 
*root, u64 flags)
u64 stripped = BTRFS_BLOCK_GROUP_RAID0 |
BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10;
 
+   if (root-fs_info-balance_ctl) {
+   struct btrfs_balance_control *bctl = root-fs_info-balance_ctl;
+   u64 tgt = 0;
+
+   /* pick restriper's target profile and return */
+   if (flags  BTRFS_BLOCK_GROUP_DATA 
+   bctl-data.flags  BTRFS_BALANCE_ARGS_CONVERT) {
+   tgt = BTRFS_BLOCK_GROUP_DATA | bctl-data.target;
+   } else if (flags  BTRFS_BLOCK_GROUP_SYSTEM 
+  bctl-sys.flags  BTRFS_BALANCE_ARGS_CONVERT) {
+   tgt = BTRFS_BLOCK_GROUP_SYSTEM | bctl-sys.target;
+   } else if (flags  BTRFS_BLOCK_GROUP_METADATA 
+  bctl-meta.flags  BTRFS_BALANCE_ARGS_CONVERT) {
+   tgt = BTRFS_BLOCK_GROUP_METADATA | bctl-meta.target;
+   }
+
+   if (tgt) {
+   /* extended - chunk profile */
+   tgt = ~BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+   return tgt;
+   }
+   }
+
/*
 * we add in the count of missing devices because we want
 * to make sure that any RAID levels on a degraded FS
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 719e61d..ad94f26 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2430,6 +2430,75 @@ int btrfs_balance(struct btrfs_balance_control *bctl, 
int resume)
}
}
 
+   /*
+* Profile changing sanity checks.  Skip them if a simple
+* balance is requested.
+*/
+   

[PATCH 14/21] Btrfs: soft profile changing mode (aka soft convert)

2012-01-06 Thread Ilya Dryomov
When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to.  This is useful if e.g. half of the FS was converted
earlier.

The soft mode switch is (like every other filter) per-type.  This means
that we can convert for example meta chunks the hard way while
converting data chunks selectively with soft switch.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/volumes.c |   22 ++
 fs/btrfs/volumes.h |6 ++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ad94f26..d1beac1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2220,6 +2220,22 @@ static int chunk_vrange_filter(struct extent_buffer 
*leaf,
return 1;
 }
 
+static int chunk_soft_convert_filter(u64 chunk_profile,
+struct btrfs_balance_args *bargs)
+{
+   BUG_ON(!(bargs-flags  BTRFS_BALANCE_ARGS_CONVERT));
+
+   chunk_profile = BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+   if (chunk_profile == 0)
+   chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+
+   if (bargs-target  chunk_profile)
+   return 1;
+
+   return 0;
+}
+
 static int should_balance_chunk(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_chunk *chunk, u64 chunk_offset)
@@ -2271,6 +2287,12 @@ static int should_balance_chunk(struct btrfs_root *root,
return 0;
}
 
+   /* soft profile changing mode */
+   if ((bargs-flags  BTRFS_BALANCE_ARGS_SOFT) 
+   chunk_soft_convert_filter(chunk_type, bargs)) {
+   return 0;
+   }
+
return 1;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 13ff08d..9879a31 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -208,7 +208,13 @@ struct map_lookup {
 #define BTRFS_BALANCE_ARGS_DRANGE  (1ULL  3)
 #define BTRFS_BALANCE_ARGS_VRANGE  (1ULL  4)
 
+/*
+ * Profile changing flags.  When SOFT is set we won't relocate chunk if
+ * it already has the target profile (even though it may be
+ * half-filled).
+ */
 #define BTRFS_BALANCE_ARGS_CONVERT (1ULL  8)
+#define BTRFS_BALANCE_ARGS_SOFT(1ULL  9)
 
 struct btrfs_balance_args;
 struct btrfs_balance_control {
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/21] Btrfs: save balance parameters to disk

2012-01-06 Thread Ilya Dryomov
Introduce a new btree objectid for storing balance item.  The reason is
to be able to resume restriper after a crash with the same parameters.
Balance item has a very high objectid and goes into tree of tree roots.

The key for the new item is as follows:

[ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ]

Older kernels simply ignore it so it's safe to mount with an older
kernel and then go back to the newer one.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |  133 +++-
 fs/btrfs/volumes.c |  100 +++
 2 files changed, 232 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1e7aea6..9997a59 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -86,6 +86,9 @@ struct btrfs_ordered_sum;
 /* holds checksums of all the data extents */
 #define BTRFS_CSUM_TREE_OBJECTID 7ULL
 
+/* for storing balance parameters in the root tree */
+#define BTRFS_BALANCE_OBJECTID -4ULL
+
 /* orhpan objectid for tracking unlinked/truncated files */
 #define BTRFS_ORPHAN_OBJECTID -5ULL
 
@@ -692,6 +695,54 @@ struct btrfs_root_ref {
__le16 name_len;
 } __attribute__ ((__packed__));
 
+struct btrfs_disk_balance_args {
+   /*
+* profiles to operate on, single is denoted by
+* BTRFS_AVAIL_ALLOC_BIT_SINGLE
+*/
+   __le64 profiles;
+
+   /* usage filter */
+   __le64 usage;
+
+   /* devid filter */
+   __le64 devid;
+
+   /* devid subset filter [pstart..pend) */
+   __le64 pstart;
+   __le64 pend;
+
+   /* btrfs virtual address space subset filter [vstart..vend) */
+   __le64 vstart;
+   __le64 vend;
+
+   /*
+* profile to convert to, single is denoted by
+* BTRFS_AVAIL_ALLOC_BIT_SINGLE
+*/
+   __le64 target;
+
+   /* BTRFS_BALANCE_ARGS_* */
+   __le64 flags;
+
+   __le64 unused[8];
+} __attribute__ ((__packed__));
+
+/*
+ * store balance parameters to disk so that balance can be properly
+ * resumed after crash or unmount
+ */
+struct btrfs_balance_item {
+   /* BTRFS_BALANCE_* */
+   __le64 flags;
+
+   struct btrfs_disk_balance_args data;
+   struct btrfs_disk_balance_args meta;
+   struct btrfs_disk_balance_args sys;
+
+   __le64 unused[4];
+} __attribute__ ((__packed__));
+
 #define BTRFS_FILE_EXTENT_INLINE 0
 #define BTRFS_FILE_EXTENT_REG 1
 #define BTRFS_FILE_EXTENT_PREALLOC 2
@@ -1409,6 +1460,8 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_DEV_ITEM_KEY 216
 #define BTRFS_CHUNK_ITEM_KEY   228
 
+#define BTRFS_BALANCE_ITEM_KEY 248
+
 /*
  * string items are for debugging.  They just store a short string of
  * data in the FS
@@ -2103,8 +2156,86 @@ BTRFS_SETGET_STACK_FUNCS(backup_bytes_used, struct 
btrfs_root_backup,
 BTRFS_SETGET_STACK_FUNCS(backup_num_devices, struct btrfs_root_backup,
   num_devices, 64);
 
-/* struct btrfs_super_block */
+/* struct btrfs_balance_item */
+BTRFS_SETGET_FUNCS(balance_flags, struct btrfs_balance_item, flags, 64);
+
+static inline void btrfs_balance_data(struct extent_buffer *eb,
+ struct btrfs_balance_item *bi,
+ struct btrfs_disk_balance_args *ba)
+{
+   read_eb_member(eb, bi, struct btrfs_balance_item, data, ba);
+}
+
+static inline void btrfs_set_balance_data(struct extent_buffer *eb,
+ struct btrfs_balance_item *bi,
+ struct btrfs_disk_balance_args *ba)
+{
+   write_eb_member(eb, bi, struct btrfs_balance_item, data, ba);
+}
+
+static inline void btrfs_balance_meta(struct extent_buffer *eb,
+ struct btrfs_balance_item *bi,
+ struct btrfs_disk_balance_args *ba)
+{
+   read_eb_member(eb, bi, struct btrfs_balance_item, meta, ba);
+}
+
+static inline void btrfs_set_balance_meta(struct extent_buffer *eb,
+ struct btrfs_balance_item *bi,
+ struct btrfs_disk_balance_args *ba)
+{
+   write_eb_member(eb, bi, struct btrfs_balance_item, meta, ba);
+}
+
+static inline void btrfs_balance_sys(struct extent_buffer *eb,
+struct btrfs_balance_item *bi,
+struct btrfs_disk_balance_args *ba)
+{
+   read_eb_member(eb, bi, struct btrfs_balance_item, sys, ba);
+}
+
+static inline void btrfs_set_balance_sys(struct extent_buffer *eb,
+struct btrfs_balance_item *bi,
+struct btrfs_disk_balance_args *ba)
+{
+   write_eb_member(eb, bi, struct btrfs_balance_item, sys, ba);
+}
 
+static inline void
+btrfs_disk_balance_args_to_cpu(struct btrfs_balance_args *cpu,
+  struct btrfs_disk_balance_args 

[PATCH 16/21] Btrfs: recover balance on mount

2012-01-06 Thread Ilya Dryomov
On mount, if balance item is found, resume balance in a separate
kernel thread.

Try to be smart to continue roughly where previous balance (or convert)
was interrupted.  For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/disk-io.c |4 ++
 fs/btrfs/volumes.c |  127 +++-
 fs/btrfs/volumes.h |1 +
 3 files changed, 131 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 190a1b2..eb7a11a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2427,6 +2427,10 @@ retry_root_backup:
if (!err)
err = btrfs_orphan_cleanup(fs_info-tree_root);
up_read(fs_info-cleanup_work_sem);
+
+   if (!err)
+   err = btrfs_recover_balance(fs_info-tree_root);
+
if (err) {
close_ctree(tree_root);
return ERR_PTR(err);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1f9fdb7..c03df10 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -23,6 +23,7 @@
 #include linux/random.h
 #include linux/iocontext.h
 #include linux/capability.h
+#include linux/kthread.h
 #include asm/div64.h
 #include compat.h
 #include ctree.h
@@ -2165,6 +2166,46 @@ out:
 }
 
 /*
+ * This is a heuristic used to reduce the number of chunks balanced on
+ * resume after balance was interrupted.
+ */
+static void update_balance_args(struct btrfs_balance_control *bctl)
+{
+   /*
+* Turn on soft mode for chunk types that were being converted.
+*/
+   if (bctl-data.flags  BTRFS_BALANCE_ARGS_CONVERT)
+   bctl-data.flags |= BTRFS_BALANCE_ARGS_SOFT;
+   if (bctl-sys.flags  BTRFS_BALANCE_ARGS_CONVERT)
+   bctl-sys.flags |= BTRFS_BALANCE_ARGS_SOFT;
+   if (bctl-meta.flags  BTRFS_BALANCE_ARGS_CONVERT)
+   bctl-meta.flags |= BTRFS_BALANCE_ARGS_SOFT;
+
+   /*
+* Turn on usage filter if is not already used.  The idea is
+* that chunks that we have already balanced should be
+* reasonably full.  Don't do it for chunks that are being
+* converted - that will keep us from relocating unconverted
+* (albeit full) chunks.
+*/
+   if (!(bctl-data.flags  BTRFS_BALANCE_ARGS_USAGE) 
+   !(bctl-data.flags  BTRFS_BALANCE_ARGS_CONVERT)) {
+   bctl-data.flags |= BTRFS_BALANCE_ARGS_USAGE;
+   bctl-data.usage = 90;
+   }
+   if (!(bctl-sys.flags  BTRFS_BALANCE_ARGS_USAGE) 
+   !(bctl-sys.flags  BTRFS_BALANCE_ARGS_CONVERT)) {
+   bctl-sys.flags |= BTRFS_BALANCE_ARGS_USAGE;
+   bctl-sys.usage = 90;
+   }
+   if (!(bctl-meta.flags  BTRFS_BALANCE_ARGS_USAGE) 
+   !(bctl-meta.flags  BTRFS_BALANCE_ARGS_CONVERT)) {
+   bctl-meta.flags |= BTRFS_BALANCE_ARGS_USAGE;
+   bctl-meta.usage = 90;
+   }
+}
+
+/*
  * Should be called with both balance and volume mutexes held to
  * serialize other volume operations (add_dev/rm_dev/resize) with
  * restriper.  Same goes for unset_balance_control.
@@ -2621,8 +2662,13 @@ do_balance:
goto out;
BUG_ON((ret == -EEXIST  !resume) || (ret != -EEXIST  resume));
 
-   if (!resume)
+   if (!resume) {
set_balance_control(bctl);
+   } else {
+   spin_lock(fs_info-balance_lock);
+   update_balance_args(bctl);
+   spin_unlock(fs_info-balance_lock);
+   }
 
mutex_unlock(fs_info-balance_mutex);
 
@@ -2641,6 +2687,85 @@ out:
return ret;
 }
 
+static int restriper_kthread(void *data)
+{
+   struct btrfs_balance_control *bctl =
+   (struct btrfs_balance_control *)data;
+   struct btrfs_fs_info *fs_info = bctl-fs_info;
+   int ret;
+
+   mutex_lock(fs_info-volume_mutex);
+   mutex_lock(fs_info-balance_mutex);
+
+   set_balance_control(bctl);
+
+   printk(KERN_INFO btrfs: continuing balance\n);
+   ret = btrfs_balance(bctl, 1);
+
+   mutex_unlock(fs_info-balance_mutex);
+   mutex_unlock(fs_info-volume_mutex);
+   return ret;
+}
+
+int btrfs_recover_balance(struct btrfs_root *tree_root)
+{
+   struct task_struct *tsk;
+   struct btrfs_balance_control *bctl;
+   struct btrfs_balance_item *item;
+   struct btrfs_disk_balance_args disk_bargs;
+   struct btrfs_path *path;
+   struct extent_buffer *leaf;
+   struct btrfs_key key;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
+   if (!bctl) {
+   

[PATCH 17/21] Btrfs: add skip_balance mount option

2012-01-06 Thread Ilya Dryomov
Since restriper kthread starts involuntarily on mount and can suck cpu
and memory bandwidth add a mount option to forcefully skip it.  The
restriper in that case hangs around in paused state and can be resumed
from userspace when it's convenient.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/super.c   |   11 +--
 fs/btrfs/volumes.c |   10 +++---
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9997a59..99eb2bc 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1492,6 +1492,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_AUTO_DEFRAG(1  16)
 #define BTRFS_MOUNT_INODE_MAP_CACHE(1  17)
 #define BTRFS_MOUNT_RECOVERY   (1  18)
+#define BTRFS_MOUNT_SKIP_BALANCE   (1  19)
 
 #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 34a8b61..063b521 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -164,8 +164,9 @@ enum {
Opt_compress_type, Opt_compress_force, Opt_compress_force_type,
Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
-   Opt_enospc_debug, Opt_subvolrootid, Opt_defrag,
-   Opt_inode_cache, Opt_no_space_cache, Opt_recovery, Opt_err,
+   Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
+   Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
+   Opt_err,
 };
 
 static match_table_t tokens = {
@@ -200,6 +201,7 @@ static match_table_t tokens = {
{Opt_inode_cache, inode_cache},
{Opt_no_space_cache, nospace_cache},
{Opt_recovery, recovery},
+   {Opt_skip_balance, skip_balance},
{Opt_err, NULL},
 };
 
@@ -398,6 +400,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
printk(KERN_INFO btrfs: enabling auto recovery);
btrfs_set_opt(info-mount_opt, RECOVERY);
break;
+   case Opt_skip_balance:
+   btrfs_set_opt(info-mount_opt, SKIP_BALANCE);
+   break;
case Opt_err:
printk(KERN_INFO btrfs: unrecognized mount option 
   '%s'\n, p);
@@ -723,6 +728,8 @@ static int btrfs_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_puts(seq, ,autodefrag);
if (btrfs_test_opt(root, INODE_MAP_CACHE))
seq_puts(seq, ,inode_cache);
+   if (btrfs_test_opt(root, SKIP_BALANCE))
+   seq_puts(seq, ,skip_balance);
return 0;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c03df10..c50a0af 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2692,15 +2692,19 @@ static int restriper_kthread(void *data)
struct btrfs_balance_control *bctl =
(struct btrfs_balance_control *)data;
struct btrfs_fs_info *fs_info = bctl-fs_info;
-   int ret;
+   int ret = 0;
 
mutex_lock(fs_info-volume_mutex);
mutex_lock(fs_info-balance_mutex);
 
set_balance_control(bctl);
 
-   printk(KERN_INFO btrfs: continuing balance\n);
-   ret = btrfs_balance(bctl, 1);
+   if (btrfs_test_opt(fs_info-tree_root, SKIP_BALANCE)) {
+   printk(KERN_INFO btrfs: force skipping balance\n);
+   } else {
+   printk(KERN_INFO btrfs: continuing balance\n);
+   ret = btrfs_balance(bctl, 1);
+   }
 
mutex_unlock(fs_info-balance_mutex);
mutex_unlock(fs_info-volume_mutex);
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/21] Btrfs: allow for pausing restriper

2012-01-06 Thread Ilya Dryomov
Implement an ioctl for pausing restriper.  This pauses the relocation,
but balance is still considered to be in progress: balance item is
not deleted, other volume operations cannot be started, etc.  If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free it's data
structures on unmount.  (It's safe to unmount when restriper is in
paused state, we will resume with the same parameters on the next
mount)

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |4 
 fs/btrfs/disk-io.c |6 ++
 fs/btrfs/ioctl.c   |   23 ++-
 fs/btrfs/ioctl.h   |4 
 fs/btrfs/volumes.c |   52 ++--
 fs/btrfs/volumes.h |1 +
 6 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 99eb2bc..1afda75 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1214,7 +1214,10 @@ struct btrfs_fs_info {
/* restriper state */
spinlock_t balance_lock;
struct mutex balance_mutex;
+   atomic_t balance_running;
+   atomic_t balance_pause_req;
struct btrfs_balance_control *balance_ctl;
+   wait_queue_head_t balance_wait_q;
 
unsigned data_chunk_allocations;
unsigned metadata_ratio;
@@ -2658,6 +2661,7 @@ static inline int btrfs_fs_closing(struct btrfs_fs_info 
*fs_info)
 }
 static inline void free_fs_info(struct btrfs_fs_info *fs_info)
 {
+   kfree(fs_info-balance_ctl);
kfree(fs_info-delayed_root);
kfree(fs_info-extent_root);
kfree(fs_info-tree_root);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index eb7a11a..8ce8374 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2004,7 +2004,10 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
spin_lock_init(fs_info-balance_lock);
mutex_init(fs_info-balance_mutex);
+   atomic_set(fs_info-balance_running, 0);
+   atomic_set(fs_info-balance_pause_req, 0);
fs_info-balance_ctl = NULL;
+   init_waitqueue_head(fs_info-balance_wait_q);
 
sb-s_blocksize = 4096;
sb-s_blocksize_bits = blksize_bits(4096);
@@ -2980,6 +2983,9 @@ int close_ctree(struct btrfs_root *root)
fs_info-closing = 1;
smp_mb();
 
+   /* pause restriper - we want to resume on mount */
+   btrfs_pause_balance(root-fs_info);
+
btrfs_scrub_cancel(root);
 
/* wait for any defraggers to finish */
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e20d0cb..7830fae 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3081,6 +3081,11 @@ static long btrfs_ioctl_balance(struct btrfs_root *root, 
void __user *arg)
mutex_lock(fs_info-volume_mutex);
mutex_lock(fs_info-balance_mutex);
 
+   if (fs_info-balance_ctl) {
+   ret = -EINPROGRESS;
+   goto out;
+   }
+
bctl = kzalloc(sizeof(*bctl), GFP_NOFS);
if (!bctl) {
ret = -ENOMEM;
@@ -3108,7 +3113,8 @@ static long btrfs_ioctl_balance(struct btrfs_root *root, 
void __user *arg)
 
ret = btrfs_balance(bctl, 0);
/*
-* bctl is freed in __cancel_balance
+* bctl is freed in __cancel_balance or in free_fs_info if
+* restriper was paused all the way until unmount
 */
kfree(bargs);
 out:
@@ -3117,6 +3123,19 @@ out:
return ret;
 }
 
+static long btrfs_ioctl_balance_ctl(struct btrfs_root *root, int cmd)
+{
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   switch (cmd) {
+   case BTRFS_BALANCE_CTL_PAUSE:
+   return btrfs_pause_balance(root-fs_info);
+   }
+
+   return -EINVAL;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -3195,6 +3214,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_scrub_progress(root, argp);
case BTRFS_IOC_BALANCE_V2:
return btrfs_ioctl_balance(root, argp);
+   case BTRFS_IOC_BALANCE_CTL:
+   return btrfs_ioctl_balance_ctl(root, arg);
}
 
return -ENOTTY;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 0ca8059..f069138 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -109,6 +109,9 @@ struct btrfs_ioctl_fs_info_args {
__u64 reserved[124];/* pad to 1k */
 };
 
+/* balance control ioctl modes */
+#define BTRFS_BALANCE_CTL_PAUSE1
+
 /*
  * this is packed, because it should be exactly the same as its disk
  * byte order counterpart (struct btrfs_disk_balance_args)
@@ -315,6 +318,7 @@ struct btrfs_ioctl_logical_ino_args {
   struct btrfs_ioctl_fs_info_args)
 #define BTRFS_IOC_BALANCE_V2 _IOW(BTRFS_IOCTL_MAGIC, 32, \
  struct btrfs_ioctl_balance_args)
+#define 

[PATCH 19/21] Btrfs: allow for cancelling restriper

2012-01-06 Thread Ilya Dryomov
Implement an ioctl for cancelling restriper.  Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit.  Balance item is deleted and no memory
about the interrupted balance is kept.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/disk-io.c |1 +
 fs/btrfs/ioctl.c   |2 ++
 fs/btrfs/ioctl.h   |1 +
 fs/btrfs/volumes.c |   47 ---
 fs/btrfs/volumes.h |1 +
 6 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1afda75..dfc136c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1216,6 +1216,7 @@ struct btrfs_fs_info {
struct mutex balance_mutex;
atomic_t balance_running;
atomic_t balance_pause_req;
+   atomic_t balance_cancel_req;
struct btrfs_balance_control *balance_ctl;
wait_queue_head_t balance_wait_q;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8ce8374..c23b82d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2006,6 +2006,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
mutex_init(fs_info-balance_mutex);
atomic_set(fs_info-balance_running, 0);
atomic_set(fs_info-balance_pause_req, 0);
+   atomic_set(fs_info-balance_cancel_req, 0);
fs_info-balance_ctl = NULL;
init_waitqueue_head(fs_info-balance_wait_q);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7830fae..d47ff8e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3131,6 +3131,8 @@ static long btrfs_ioctl_balance_ctl(struct btrfs_root 
*root, int cmd)
switch (cmd) {
case BTRFS_BALANCE_CTL_PAUSE:
return btrfs_pause_balance(root-fs_info);
+   case BTRFS_BALANCE_CTL_CANCEL:
+   return btrfs_cancel_balance(root-fs_info);
}
 
return -EINVAL;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index f069138..33ec6d8 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -111,6 +111,7 @@ struct btrfs_ioctl_fs_info_args {
 
 /* balance control ioctl modes */
 #define BTRFS_BALANCE_CTL_PAUSE1
+#define BTRFS_BALANCE_CTL_CANCEL   2
 
 /*
  * this is packed, because it should be exactly the same as its disk
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9d15819..f7248c9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2490,7 +2490,8 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
key.type = BTRFS_CHUNK_ITEM_KEY;
 
while (1) {
-   if (atomic_read(fs_info-balance_pause_req)) {
+   if (atomic_read(fs_info-balance_pause_req) ||
+   atomic_read(fs_info-balance_cancel_req)) {
ret = -ECANCELED;
goto error;
}
@@ -2556,7 +2557,10 @@ error:
 
 static inline int balance_need_close(struct btrfs_fs_info *fs_info)
 {
-   return atomic_read(fs_info-balance_pause_req) == 0;
+   /* cancel requested || normal exit path */
+   return atomic_read(fs_info-balance_cancel_req) ||
+   (atomic_read(fs_info-balance_pause_req) == 0 
+atomic_read(fs_info-balance_cancel_req) == 0);
 }
 
 static void __cancel_balance(struct btrfs_fs_info *fs_info)
@@ -2578,7 +2582,8 @@ int btrfs_balance(struct btrfs_balance_control *bctl, int 
resume)
int ret;
 
if (btrfs_fs_closing(fs_info) ||
-   atomic_read(fs_info-balance_pause_req)) {
+   atomic_read(fs_info-balance_pause_req) ||
+   atomic_read(fs_info-balance_cancel_req)) {
ret = -EINVAL;
goto out;
}
@@ -2818,6 +2823,42 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
return ret;
 }
 
+int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
+{
+   mutex_lock(fs_info-balance_mutex);
+   if (!fs_info-balance_ctl) {
+   mutex_unlock(fs_info-balance_mutex);
+   return -ENOTCONN;
+   }
+
+   atomic_inc(fs_info-balance_cancel_req);
+   /*
+* if we are running just wait and return, balance item is
+* deleted in btrfs_balance in this case
+*/
+   if (atomic_read(fs_info-balance_running)) {
+   mutex_unlock(fs_info-balance_mutex);
+   wait_event(fs_info-balance_wait_q,
+  atomic_read(fs_info-balance_running) == 0);
+   mutex_lock(fs_info-balance_mutex);
+   } else {
+   /* __cancel_balance needs volume_mutex */
+   mutex_unlock(fs_info-balance_mutex);
+   mutex_lock(fs_info-volume_mutex);
+   mutex_lock(fs_info-balance_mutex);
+
+   if (fs_info-balance_ctl)
+   __cancel_balance(fs_info);
+
+   mutex_unlock(fs_info-volume_mutex);
+   }
+
+   BUG_ON(fs_info-balance_ctl || 

[PATCH 20/21] Btrfs: allow for resuming restriper after it was paused

2012-01-06 Thread Ilya Dryomov
Implement an ioctl for resuming restriper.  We use the same heuristics
used when recovering balance after a crash to try to start where we left
off last time.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c   |2 ++
 fs/btrfs/ioctl.h   |1 +
 fs/btrfs/volumes.c |   27 +++
 fs/btrfs/volumes.h |1 +
 4 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d47ff8e..a83b1a5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3133,6 +3133,8 @@ static long btrfs_ioctl_balance_ctl(struct btrfs_root 
*root, int cmd)
return btrfs_pause_balance(root-fs_info);
case BTRFS_BALANCE_CTL_CANCEL:
return btrfs_cancel_balance(root-fs_info);
+   case BTRFS_BALANCE_CTL_RESUME:
+   return btrfs_resume_balance(root-fs_info);
}
 
return -EINVAL;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 33ec6d8..c691ef4 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -112,6 +112,7 @@ struct btrfs_ioctl_fs_info_args {
 /* balance control ioctl modes */
 #define BTRFS_BALANCE_CTL_PAUSE1
 #define BTRFS_BALANCE_CTL_CANCEL   2
+#define BTRFS_BALANCE_CTL_RESUME   3
 
 /*
  * this is packed, because it should be exactly the same as its disk
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f7248c9..d907635 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2859,6 +2859,33 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
return 0;
 }
 
+int btrfs_resume_balance(struct btrfs_fs_info *fs_info)
+{
+   int ret;
+
+   if (fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
+   mutex_lock(fs_info-volume_mutex);
+   mutex_lock(fs_info-balance_mutex);
+
+   if (!fs_info-balance_ctl) {
+   ret = -ENOTCONN;
+   goto out;
+   }
+
+   if (atomic_read(fs_info-balance_running)) {
+   ret = -EINPROGRESS;
+   goto out;
+   }
+
+   ret = btrfs_balance(fs_info-balance_ctl, 1);
+out:
+   mutex_unlock(fs_info-balance_mutex);
+   mutex_unlock(fs_info-volume_mutex);
+   return ret;
+}
+
 /*
  * shrinking a device means finding all of the device extents past
  * the new size, and then following the back refs to the chunks.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 4429efc..6271d8e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -273,6 +273,7 @@ int btrfs_balance(struct btrfs_balance_control *rctl, int 
resume);
 int btrfs_recover_balance(struct btrfs_root *tree_root);
 int btrfs_pause_balance(struct btrfs_fs_info *fs_info);
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
+int btrfs_resume_balance(struct btrfs_fs_info *fs_info);
 int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
 int find_free_dev_extent(struct btrfs_trans_handle *trans,
 struct btrfs_device *device, u64 num_bytes,
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/21] Btrfs: add balance progress reporting

2012-01-06 Thread Ilya Dryomov
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/ioctl.c   |   51 +++
 fs/btrfs/ioctl.h   |6 ++
 fs/btrfs/volumes.c |   35 +--
 fs/btrfs/volumes.h |3 +++
 4 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a83b1a5..47abfc2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3140,6 +3140,55 @@ static long btrfs_ioctl_balance_ctl(struct btrfs_root 
*root, int cmd)
return -EINVAL;
 }
 
+static long btrfs_ioctl_balance_progress(struct btrfs_root *root,
+void __user *arg)
+{
+   struct btrfs_fs_info *fs_info = root-fs_info;
+   struct btrfs_ioctl_balance_args *bargs;
+   struct btrfs_balance_control *bctl;
+   int ret = 0;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   mutex_lock(fs_info-balance_mutex);
+   if (!(bctl = fs_info-balance_ctl)) {
+   ret = -ENOTCONN;
+   goto out;
+   }
+
+   bargs = kzalloc(sizeof(*bargs), GFP_NOFS);
+   if (!bargs) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   bargs-flags = bctl-flags;
+
+   if (atomic_read(fs_info-balance_running))
+   bargs-state |= BTRFS_BALANCE_STATE_RUNNING;
+   if (atomic_read(fs_info-balance_cancel_req))
+   bargs-state |= BTRFS_BALANCE_STATE_CANCEL_REQ;
+   if (atomic_read(fs_info-balance_pause_req))
+   bargs-state |= BTRFS_BALANCE_STATE_PAUSE_REQ;
+
+   memcpy(bargs-data, bctl-data, sizeof(bargs-data));
+   memcpy(bargs-meta, bctl-meta, sizeof(bargs-meta));
+   memcpy(bargs-sys, bctl-sys, sizeof(bargs-sys));
+
+   spin_lock(fs_info-balance_lock);
+   memcpy(bargs-stat, bctl-stat, sizeof(bargs-stat));
+   spin_unlock(fs_info-balance_lock);
+
+   if (copy_to_user(arg, bargs, sizeof(*bargs)))
+   ret = -EFAULT;
+
+   kfree(bargs);
+out:
+   mutex_unlock(fs_info-balance_mutex);
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -3220,6 +3269,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_balance(root, argp);
case BTRFS_IOC_BALANCE_CTL:
return btrfs_ioctl_balance_ctl(root, arg);
+   case BTRFS_IOC_BALANCE_PROGRESS:
+   return btrfs_ioctl_balance_progress(root, argp);
}
 
return -ENOTTY;
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index c691ef4..04f8c1b 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -142,6 +142,10 @@ struct btrfs_balance_progress {
__u64 completed;/* # of chunks relocated so far */
 };
 
+#define BTRFS_BALANCE_STATE_RUNNING(1ULL  0)
+#define BTRFS_BALANCE_STATE_CANCEL_REQ (1ULL  1)
+#define BTRFS_BALANCE_STATE_PAUSE_REQ  (1ULL  2)
+
 struct btrfs_ioctl_balance_args {
__u64 flags;/* in/out */
__u64 state;/* out */
@@ -321,6 +325,8 @@ struct btrfs_ioctl_logical_ino_args {
 #define BTRFS_IOC_BALANCE_V2 _IOW(BTRFS_IOCTL_MAGIC, 32, \
  struct btrfs_ioctl_balance_args)
 #define BTRFS_IOC_BALANCE_CTL _IOW(BTRFS_IOCTL_MAGIC, 33, int)
+#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 34, \
+   struct btrfs_ioctl_balance_args)
 #define BTRFS_IOC_INO_PATHS _IOWR(BTRFS_IOCTL_MAGIC, 35, \
struct btrfs_ioctl_ino_path_args)
 #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d907635..d7c5c7d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2439,6 +2439,7 @@ static u64 div_factor(u64 num, int factor)
 
 static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 {
+   struct btrfs_balance_control *bctl = fs_info-balance_ctl;
struct btrfs_root *chunk_root = fs_info-chunk_root;
struct btrfs_root *dev_root = fs_info-dev_root;
struct list_head *devices;
@@ -2454,6 +2455,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
int slot;
int ret;
int enospc_errors = 0;
+   bool counting = true;
 
/* step one make some room on all the devices */
devices = fs_info-fs_devices-devices;
@@ -2485,12 +2487,18 @@ static int __btrfs_balance(struct btrfs_fs_info 
*fs_info)
ret = -ENOMEM;
goto error;
}
+
+   /* zero out stat counters */
+   spin_lock(fs_info-balance_lock);
+   memset(bctl-stat, 0, sizeof(bctl-stat));
+   spin_unlock(fs_info-balance_lock);
+again:
key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
key.offset = (u64)-1;
key.type = BTRFS_CHUNK_ITEM_KEY;
 
while (1) {
-   if 

[PATCH] Btrfs-progs: restriper interface

2012-01-06 Thread Ilya Dryomov
Hello,

This is an update of userspace restriper interface.  The main change is
that restriper commands have been moved under balance prefix.  So now we
have:

btrfs fi balance start
btrfs fi balance pause
btrfs fi balance cancel
btrfs fi balance resume
btrfs fi balance status

This breaks btrfs-progs backwards compatibility: to get the old
balancing behaviour you have to call 'btrfs fi balance start' instead of
'btrfs fi balance'.  This is caused by stupidity of the core sub-command
matcher.  There are also some other problems with that parser and I'll
fix them all in one commit shortly.  After that the start will be
optional:

btrfs fi balance [options]
btrfs fi balance start [options]

Apart from some minor error handling fixes went in.

Here are some specs:

./btrfs fi balance start [-d[filters]] [-m[filters]] [-s[filters]]
[-vf] path

where 'filters' is comma-separated list of filters (comma == AND):

o profiles={profiles mask} - profiles filter

profiles mask is '|'-separated list ('|' == OR) of profiles

o usage={percentage} - usage filter

o devid={devid} - devid filter

o drange={start..end} - devid subset filter, it's tied to devid filter:
we say balance out range [start..end) on a particular devid.  These are
also acceptable:

drange=start.. - [start..end of device)
drange=..end - [start of device..end)

o vrange={start..end} - virtual address space subset filter.  Same forms
as above are acceptable.

Convert (profile changing) is specified as follows:

o convert={profile},[soft]

soft parameter makes sense only for convert option.  It turns on
soft mode for profile changing, see the kernel patch.

Each chunk type can be either balanced or converted to some new profile.
By specifying some filters w/o convert option we balance chunks that
passed all filters (remember, comma == AND).  If only convert parameter
is specified we convert all chunks of that type.  If both convert and
filters are specified restriper filters out chunks according to the
given filters and then converts everything that passed through all the
filters.

By default system chunks are relocated along with meta chunks with the
same exact options.  To operate explicitly on system chunks -f (--force)
flag has to be specified.

Examples (somewhat contrived, but they demonstrate the flexibility of
the interface):

o ./btrfs fi balance start path

will balance everything (what ./btrfs fi balance path did)

o ./btrfs fi balance start -d

will balance only data chunks

o ./btrfs fi balance start -d -m

will balance everything (because -m by default applies to system chunks
too, see above)

o ./btrfs fi balance start -d -s -f

will balance all data and system chunks, won't touch meta chunks (note
that the force is used to operate explicitly on system chunks)

o ./btrfs fi balance start -dprofiles=raid1\|raid0

will balance only data chunks that have raid1 or raid0 profile
(\ - shell escape)

o ./btrfs fi balance start -mprofiles=raid1\|raid0,devid=2

will balance meta and sys chunks that have raid1 or raid0 profile and at
least one stripe located on device with devid 2

o ./btrfs fi balance start -s -musage=80,profiles=dup -f

will balance all system chunks and dup'ed metadata chunks which are less
than 80% full

o ./btrfs fi balance start -s -mprofiles=dup,convert=raid1 -f

will *balance* all system chunks and *convert* dup meta chunks to raid1

o ./btrfs fi balance start -dvrange=100..803337011,convert=raid0,soft

will soft-convert data chunks in that virtual address space range to
raid0

Note that you can't put a space between e.g. -m and a list of filters
because of the way getopt(3) works.  There are also long options, if you
prefer (--data, --metadata, --system), e.g:

./btrfs fi balance start --data=profiles=raid1\|raid0

All permutations are possible, restriper doesn't care about the order in
which options are given, all settings are per-chunk-type, and what you
do with one chunk type is completely independent of what you do with the
other.

The force flag also has to be given if you want to downgrade the
profile.  By downgrading I mean reducing the number of copies, so
raid10-raid1 can be done w/o this flag, while raid1-raid0 cannot.

And the management commands:

./btrfs fi balance cancel path
./btrfs fi balance pause path
./btrfs fi balance resume path

./btrfs fi balance status [-v] path

Patch is on top of master branch of btrfs-progs repo, available at:

git://github.com/idryomov/btrfs-progs.git restriper

Thanks,

Ilya


Ilya Dryomov (1):
  Btrfs-progs: add restriper commands

 btrfs.c  |   27 +++-
 btrfs_cmds.c |  577 +++---
 btrfs_cmds.h |8 +-
 ctree.h  |   23 ++-
 ioctl.h  |   53 ++
 print-tree.c |6 +
 volumes.h|   30 +++
 7 files changed, 687 insertions(+), 37 deletions(-)

-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to 

[PATCH] Btrfs-progs: add restriper commands

2012-01-06 Thread Ilya Dryomov
Import restriper commands under btrfs fi balance:

  btrfs fi balance start
  btrfs fi balance cancel
  btrfs fi balance pause
  btrfs fi balance resume
  btrfs fi balance status

NOTE: Backwards compatibility is broken for now, to get the old balance
everything behaviour one has to call 'btrfs fi balance start' with no
options instead of 'btrfs fi balance'.  This is because btrfs utility
sub-command parser is not flexible enough.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 btrfs.c  |   27 +++-
 btrfs_cmds.c |  577 +++---
 btrfs_cmds.h |8 +-
 ctree.h  |   23 ++-
 ioctl.h  |   53 ++
 print-tree.c |6 +
 volumes.h|   30 +++
 7 files changed, 687 insertions(+), 37 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 1def354..b639e83 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -119,9 +119,30 @@ static struct Command commands[] = {
Show space usage information for a mount point.,
  NULL
},
-   { do_balance, 1,
- filesystem balance, path\n
-   Balance the chunks across the device.,
+   { do_balance, -1,
+ filesystem balance start, [-d [filters]] [-m [filters]] 
+ [-s [filters]] [-vf] path\n
+   Balance chunks across the devices.,
+ NULL
+   },
+   { do_balance_pause, 1,
+ filesystem balance pause, path\n
+   Pause running balance.,
+ NULL
+   },
+   { do_balance_cancel, 1,
+ filesystem balance cancel, path\n
+   Cancel running or paused balance.,
+ NULL
+   },
+   { do_balance_resume, 1,
+ filesystem balance resume, path\n
+   Resume interrupted balance.,
+ NULL
+   },
+   { do_balance_progress, -1,
+ filesystem balance status, [-v] path\n
+   Show status of running or paused balance.,
  NULL
},
{ do_change_label, -1,
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index b59e9cb..f3ea9ab 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -18,6 +18,7 @@
 #include stdio.h
 #include stdlib.h
 #include string.h
+#include getopt.h
 #include sys/ioctl.h
 #include sys/types.h
 #include dirent.h
@@ -888,31 +889,6 @@ int do_add_volume(int nargs, char **args)
 
 }
 
-int do_balance(int argc, char **argv)
-{
-
-   int fdmnt, ret=0, e;
-   struct btrfs_ioctl_vol_args args;
-   char*path = argv[1];
-
-   fdmnt = open_file_or_dir(path);
-   if (fdmnt  0) {
-   fprintf(stderr, ERROR: can't access to '%s'\n, path);
-   return 12;
-   }
-
-   memset(args, 0, sizeof(args));
-   ret = ioctl(fdmnt, BTRFS_IOC_BALANCE, args);
-   e = errno;
-   close(fdmnt);
-   if(ret0){
-   fprintf(stderr, ERROR: error during balancing '%s' - %s\n, 
-   path, strerror(e));
-
-   return 19;
-   }
-   return 0;
-}
 int do_remove_volume(int nargs, char **args)
 {
 
@@ -946,6 +922,557 @@ int do_remove_volume(int nargs, char **args)
return 0;
 }
 
+static int parse_one_profile(const char *profile, u64 *flags)
+{
+   if (!strcmp(profile, raid0)) {
+   *flags |= BTRFS_BLOCK_GROUP_RAID0;
+   } else if (!strcmp(profile, raid1)) {
+   *flags |= BTRFS_BLOCK_GROUP_RAID1;
+   } else if (!strcmp(profile, raid10)) {
+   *flags |= BTRFS_BLOCK_GROUP_RAID10;
+   } else if (!strcmp(profile, dup)) {
+   *flags |= BTRFS_BLOCK_GROUP_DUP;
+   } else if (!strcmp(profile, single)) {
+   *flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE;
+   } else {
+   fprintf(stderr, Unknown profile '%s'\n, profile);
+   return 1;
+   }
+
+   return 0;
+}
+
+static int parse_profiles(char *profiles, u64 *flags)
+{
+   char *this_char;
+   char *save_ptr;
+
+   for (this_char = strtok_r(profiles, |, save_ptr);
+this_char != NULL;
+this_char = strtok_r(NULL, |, save_ptr)) {
+   if (parse_one_profile(this_char, flags))
+   return 1;
+   }
+
+   return 0;
+}
+
+static int parse_u64(const char *str, u64 *result)
+{
+   char *endptr;
+   u64 val;
+
+   val = strtoull(str, endptr, 10);
+   if (*endptr)
+   return 1;
+
+   *result = val;
+   return 0;
+}
+
+static int parse_range(const char *range, u64 *start, u64 *end)
+{
+   char *dots;
+
+   dots = strstr(range, ..);
+   if (dots) {
+   const char *rest = dots + 2;
+   int skipped = 0;
+
+   *dots = 0;
+
+   if (!*rest) {
+   *end = (u64)-1;
+   skipped++;
+   } else {
+   if (parse_u64(rest, end))
+   return 1;
+   }
+   if (dots == range) {
+   *start = 0;
+ 

Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs

2012-01-06 Thread Josef Bacik
On Fri, Jan 06, 2012 at 11:51:16AM +0800, Miao Xie wrote:
 On thu, 5 Jan 2012 10:15:50 -0500, Josef Bacik wrote:
  +  trans = btrfs_start_transaction(root, 2);
  +  if (IS_ERR(trans))
  +  return PTR_ERR(trans);
   
 /*
  * setattr is responsible for setting the ordered_data_close flag,
  @@ -6621,26 +6585,12 @@ static int btrfs_truncate(struct inode *inode)
  * using truncate to replace the contents of the file will
  * end up with a zero length file after a crash.
  */
  -  if (inode-i_size == 0  BTRFS_I(inode)-ordered_data_close)
  +  if (newsize == 0  BTRFS_I(inode)-ordered_data_close)
 btrfs_add_ordered_operation(trans, root, inode);
 
 Since we have write out all the dirty page, we can drop the following code 
 which is
 in front of the while loop, and move the first btrfs_start_transaction() into 
 the loop,
 the logic of btrfs_truncate() will become simpler.
 
 while (1) {
  -  ret = btrfs_block_rsv_refill(root, rsv, min_size);
  -  if (ret) {
  -  /*
  -   * This can only happen with the original transaction we
  -   * started above, every other time we shouldn't have a
  -   * transaction started yet.
  -   */
  -  if (ret == -EAGAIN)
  -  goto end_trans;
  -  err = ret;
  -  break;
  -  }
  -
  
  Taking this part out is wrong, we need to have this slack space to account 
  for
  any COW that truncate does.  Other than that this looks pretty good.  
  Thanks,
  
 
 I think we can take this part out, because we start a new transaction every 
 time we
 do a truncation, and reserve enough space at that time. See below:
 

Ok let me rephrase.  The whole reason I do this is because the reservation stuff
is tricky, we may not actually use any of this space and so constantly going
back to reserve it makes us much more likely to fail our truncate() because of
ENOSPC.  But if we just hold onto a min size and then refill it when we need to
we lower the risk considerably, so this needs to stay.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: silence warning in raid array setup

2012-01-06 Thread David Sterba
Raid array setup code creates an extent buffer in an usual way. When the
PAGE_CACHE_SIZE is  super block size, the extent pages are not marked
up-to-date, which triggers a WARN_ON in the following
write_extent_buffer call. Add an explicit SetPageUptodate call to
silence the warning.

Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/volumes.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4b839f..732ccc6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3699,6 +3699,20 @@ int btrfs_read_sys_array(struct btrfs_root *root)
return -ENOMEM;
btrfs_set_buffer_uptodate(sb);
btrfs_set_buffer_lockdep_class(root-root_key.objectid, sb, 0);
+   /*
+* The sb extent buffer is artifical and just used to read the system 
array.
+* btrfs_set_buffer_uptodate() call does not properly mark all it's
+* pages up-to-date when the page is larger: extent does not cover the
+* whole page and consequently check_page_uptodate does not find all
+* the page's extents up-to-date (the hole beyond sb),
+* write_extent_buffer then triggers a WARN_ON.
+*
+* Regular short extents go through mark_extent_buffer_dirty/writeback 
cycle,
+* but sb spans only this function. Add an explicit SetPageUptodate call
+* to silence the warning eg. on PowerPC 64.
+*/
+   if (PAGE_CACHE_SIZE  BTRFS_SUPER_INFO_SIZE)
+   SetPageUptodate(extent_buffer_page(sb, 0));
 
write_extent_buffer(sb, super_copy, 0, BTRFS_SUPER_INFO_SIZE);
array_size = btrfs_super_sys_array_size(super_copy);
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash in io_ctl_drop_pages after mount with csum errors

2012-01-06 Thread David Sterba
On Fri, Jan 06, 2012 at 03:17:59PM +0800, Li Zefan wrote:
  [ 1499.946409] BUG: unable to handle kernel NULL pointer dereference at 
  0001
  [ 1499.946437] IP: [a0456dd7] io_ctl_drop_pages+0x37/0x70 [btrfs]
 
 0x01 is weired, don't know how it occured. Nevertheless we need this fix:
 
 diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
 index ec23d43..81771ca 100644
 --- a/fs/btrfs/free-space-cache.c
 +++ b/fs/btrfs/free-space-cache.c
 @@ -319,9 +319,11 @@ static void io_ctl_drop_pages(struct io_ctl *io_ctl)
   io_ctl_unmap_page(io_ctl);
  
   for (i = 0; i  io_ctl-num_pages; i++) {
 - ClearPageChecked(io_ctl-pages[i]);
 - unlock_page(io_ctl-pages[i]);
 - page_cache_release(io_ctl-pages[i]);
 + if (io_ctl-pages[i]) {
 + ClearPageChecked(io_ctl-pages[i]);
 + unlock_page(io_ctl-pages[i]);
 + page_cache_release(io_ctl-pages[i]);
 + }
   }
  }

mount did not crash with this fix, though anything that touches files
causes the crash. umount is still stuck the same way as before. I'll not
touch the partitions in case you have patches to test.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] btrfs: allow cross-subvolume BTRFS_IOC_CLONE

2012-01-06 Thread David Sterba
On Fri, Jan 06, 2012 at 02:04:12PM +0200, Konstantinos Skarlatos wrote:
 Me too wants cp --reflink across subvolumes. Please make this feature 
 available to us, as its a poor man's dedupe and would give big space 
 savings for many use cases.

The simple case of 'cp --reflink' works fine, the only remaining case is
the unimplemented clone of part of a compressed inline extent. And it has
to be:
  * at least 2 blocks uncompressed
  * the cloned range does not span the whole extent

The ioctl needs ranges and length aligned up to blocksize, ie. 4096,
cloning of short inline extents work.

The missing case can be reproduced with this:

---
#!/bin/sh

# assume, that fs is mounted with compression on
src=test-clone-compressed-inline
dd if=/dev/zero of=$src bs=1K count=3 oflag=sync
sync
filefrag -vbs $src
clone_range $src 4096 4096 subvol2/$src-dest0 0
clone_range $src 4096 4096 subvol2/$src-dest4096 4096
clone_range $src 4096 4096 subvol2/$src-dest8192 8192

---

$ ./test-clone-range-inline
3+0 records in
3+0 records out
3072 bytes (3.1 kB) copied, 0.0487466 s, 63.0 kB/s
Filesystem type is: 9123683e
File size of test-clone-compressed-inline is 3072 (3 blocks, blocksize 1024)
 ext logical physical expected length flags
   0   004096 not_aligned,inline,eof
test-clone-compressed-inline: 1 extent found
clone_range test-clone-compressed-inline 3 4096~4096 to 
subvol2/test-clone-compressed-inline-dest0 4 0 = -1 Invalid argument
clone_range test-clone-compressed-inline 3 4096~4096 to 
subvol2/test-clone-compressed-inline-dest4096 4 4096 = -1 Invalid argument
clone_range test-clone-compressed-inline 3 4096~4096 to 
subvol2/test-clone-compressed-inline-dest8192 4 8192 = -1 Invalid argument

---

This does not work on a single subvolume, so extending clone to span
subvolumes should not break anything that hasn't been broken already.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] btrfs-progs: Add ioctl to read compressed size of a file

2012-01-06 Thread David Sterba
Signed-off-by: David Sterba dste...@suse.cz
---
v2-v3: manpage updated, added ... around file in command description

 btrfs.c|9 ++-
 btrfs_cmds.c   |   68 
 btrfs_cmds.h   |1 +
 ioctl.h|   13 ++
 man/btrfs.8.in |   10 
 5 files changed, 100 insertions(+), 1 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 1def354..d6a7665 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -128,7 +128,14 @@ static struct Command commands[] = {
  filesystem label, device [newlabel]\n
  With one argument, get the label of filesystem on device.\n
  If newlabel is passed, set the filesystem label to newlabel.\n
- The filesystem must be unmounted.\n
+ The filesystem must be unmounted.
+   },
+   { do_compr_size, -1,
+ filesystem csize, [-s start] [-e end] file\n
+ Read ordinary and compressed size of extents in the range 
[start,end)\n
+ -s start  range start inclusive, accepts K/M/G modifiers\n
+ -e endrange end exclusive, accepts K/M/G modifiers\n,
+ NULL
},
{ do_scrub_start, -1,
  scrub start, [-Bdqr] path|device\n
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index b59e9cb..1074ade 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -1305,3 +1305,71 @@ out:
free(inodes);
return ret;
 }
+
+int do_compr_size(int argc, char **argv)
+{
+   int ret;
+   int fd;
+   struct btrfs_ioctl_compr_size_args args;
+
+   args.start = 0;
+   args.end = (u64)-1;
+   optind = 1;
+   while (1) {
+   int c = getopt(argc, argv, s:e:r);
+   if (c  0)
+   break;
+   switch (c) {
+   case 's':
+   args.start = parse_size(optarg);
+   break;
+   case 'e':
+   args.end = parse_size(optarg);
+   break;
+   default:
+   fprintf(stderr, ERROR: Invalid arguments for csize\n);
+   return 1;
+   }
+   }
+
+   if (args.start  args.end) {
+   fprintf(stderr, ERROR: Invalid range for csize\n);
+   return 1;
+   }
+
+   if (argc - optind == 0) {
+   fprintf(stderr, ERROR: Invalid arguments for csize\n);
+   return 1;
+   }
+   argc -= optind;
+
+   fd = open_file_or_dir(argv[optind]);
+   if (fd  0) {
+   fprintf(stderr, ERROR: can't access '%s'\n, argv[optind]);
+   return 1;
+   }
+
+   ret = ioctl(fd, BTRFS_IOC_COMPR_SIZE, args);
+   if (ret  0) {
+   fprintf(stderr, ERROR: ioctl returned %d, errno %d %s\n,
+   ret, errno, strerror(errno));
+   return errno;
+   }
+
+   printf(File name: %s\n, argv[optind]);
+   if (args.end == (u64)-1)
+   printf(File range:%llu-EOF\n,
+   (unsigned long long)args.start);
+   else
+   printf(File range:%llu-%llu\n,
+   (unsigned long long)args.start,
+   (unsigned long long)args.end);
+
+   printf(Compressed size:   %llu\n,
+   (unsigned long long)(args.compressed_size  9));
+   printf(Uncompressed size: %llu\n,
+   (unsigned long long)(args.size  9));
+   printf(Ratio: %3.2f%%\n,
+   100.0 * args.compressed_size / args.size);
+   return 0;
+}
diff --git a/btrfs_cmds.h b/btrfs_cmds.h
index 81182b1..d171214 100644
--- a/btrfs_cmds.h
+++ b/btrfs_cmds.h
@@ -42,3 +42,4 @@ int open_file_or_dir(const char *fname);
 int do_ino_to_path(int nargs, char **argv);
 int do_logical_to_ino(int nargs, char **argv);
 char *path_for_root(int fd, u64 root);
+int do_compr_size(int argc, char **argv);
diff --git a/ioctl.h b/ioctl.h
index 1ae7537..5b5208a 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -224,6 +224,17 @@ struct btrfs_ioctl_logical_ino_args {
__u64   inodes;
 };
 
+struct btrfs_ioctl_compr_size_args {
+   /* Range start, inclusive */
+   __u64   start;  /* in */
+   /* Range end, exclusive */
+   __u64   end;/* in */
+   __u64   size;   /* out */
+   __u64   compressed_size;/* out */
+   __u64   reserved[2];
+};
+
+
 /* BTRFS_IOC_SNAP_CREATE is no longer used by the btrfs command */
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
@@ -277,5 +288,7 @@ struct btrfs_ioctl_logical_ino_args {
struct btrfs_ioctl_ino_path_args)
 #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \

Re: Honest timeline for btrfsck

2012-01-06 Thread Danny Piccirillo
Chris Mason chris.mason at oracle.com writes:

 
 So over the next two weeks I'm juggling the merge window and the fsck
 release.  My goal is to demo fsck at linuxcon europe.  Thanks again for
 all of your patience and help with Btrfs!
 

So we have a lot of new features which is awesome but still not enough for
production use (I'm hoping that Fedora will actually be able to ship with btrfs
this time).

What's new? What are the next goals and hopeful deadlines? 

Best,
.danny

-- 
☮ ♥ Ⓐ
 .danny

This email is: [ x ] bloggable   [  ] shareable with consent   [  ] lethal if
repeated or forwarded

[턽#] The Silent Number - http://thesilentnumber.me/

µBlog: http://identi.ca/dpic


Q: Why is this email five sentences or less?
A: http://five.sentenc.es

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs partition lost after RAID1 mirror disk failure?

2012-01-06 Thread C Anthony Risinger
On Wed, Jan 4, 2012 at 2:30 PM, Dan Garton dan.gar...@gmail.com wrote:

 Assuming that this is the case, do I stand a chance of retrieving that
 volume and accessing that data again?
 Or does destructive imply total loss? (In which case, I'll cut my
 losses)

unfortunately i really don't know enough to advise ... i did similar
actions a long time ago while experimenting for an initcpio-based
rollback utility, but in my case the FS was a dummy/loopback, so i
just burned it.  my only suggestion would be to try Josef's readonly
recovery/slurp utility and maybe you can pull some data off, since my
completely uninformed guess is the structures are 99% intact, but your
UUIDs no longer match, or internal top-level pointers to wrong
locations, etc etc.  perhaps someone more familiar with actual
internals can be of more help -- good luck.

-- 

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html