Re: wrong values in df and btrfs filesystem df

2011-04-12 Thread Miao Xie
On Mon, 11 Apr 2011 08:29:46 +0100, Stephane Chazelas wrote:
 2011-04-10 18:13:51 +0800, Miao Xie:
 [...]
 # df /srv/MM

 Filesystem   1K-blocks  Used Available Use% Mounted on
 /dev/sdd15846053400 1593436456 2898463184  36% /srv/MM

 # btrfs filesystem df /srv/MM

 Data, RAID0: total=1.67TB, used=1.48TB
 System, RAID1: total=16.00MB, used=112.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=3.75GB, used=2.26GB

 # btrfs-show

 Label: MMedia  uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2
Total devices 3 FS bytes used 1.48TB
devid3 size 1.81TB used 573.76GB path /dev/sdb1
devid2 size 1.81TB used 573.77GB path /dev/sde1
devid1 size 1.82TB used 570.01GB path /dev/sdd1

 Btrfs Btrfs v0.19

 

 df shows an Available value which isn't related to any real value.  

I _think_ that value is the amount of space not allocated to any
 block group. If that's so, then Available (from df) plus the three
 total values (from btrfs fi df) should equal the size value from df.

 This value excludes the space that can not be allocated to any block group,
 This feature was implemented to fix the bug df command add the disk space, 
 which
 can not be allocated to any block group forever, into the Available value.
 (see the changelog of the commit 6d07bcec969af335d4e35b3921131b7929bd634e)

 This implementation just like fake chunk allocation, but the fake allocation
 just allocate the space from two of these three disks, doesn't spread the
 stripes over all the disks, which has enough space.
 [...]
 
 Hi Miao,
 
 would you care to expand a bit on that. In Helmut's case above
 where all the drives have at least 1.2TB free, how would there
 be un-allocatable space?
 
 What's the implication of having disks of differing sizes? Does
 that mean that the extra space on larger disks is lost?

I'm sorry that I couldn't explain it clearly.

As we know, Btrfs introduced RAID fucntion, and it can allocate some stripes 
from
different disks to make up a RAID block group. But if there is not enough disk 
space
to allocate enough stripes, btrfs can't make up a new block group, and the left 
disk
space can't be used forever. 

For example, If we have two disks, one is 5GB, and the other is 10GB, and we 
use RAID0
block groups to store the file data. The RAID0 block group needs two stripes 
which are
on the different disks at least. After all space on the 5GB disk is allocated, 
there is
about 5GB free space on the 10GB disk, this space can not be used because we 
have
no free space on the other disk to allocate, and can't make up a new RAID0 
block group. 

Beside the two-stripe limit, the chunk allocator will allocate stripes from 
every disk
as much as possible, to make up a new RAID0 block group. That is if all the 
disks have
enough free space, the allocator will allocate stripes from all the disks.

In Helmut's case, the chunk allocator will allocate three same-size stripes 
from those
three disks to make up the new RAID0 block group, every time btrfs allocate new 
chunks
(block groups), until there is no free space on two disks. So btrfs can use 
most of the
disk space for RAID0 block group.

But the algorithm of df command doesn't simulate the above allocation 
correctly, this
simulated allocation just allocates the stripes from two disks, and then, these 
two disks
have no free space, but the third disk still has 1.2TB free space, df command 
thinks
this space can be used to make a new RAID0 block group and ignores it. This is 
a bug,
I think.

BTW: Available value is the size of the free space that we may use it to 
store the file
data. In btrfs filesystem, it is hard to calculate, because the block groups 
are allocated
dynamically, not all the free space on the disks is allocated to make up data 
block groups,
some of the space may be allocated to make up data block groups. So we just 
tell the users
the size of free space maybe they can use to store the file data.

Thanks
Miao

 
 Thanks,
 Stephane
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: allocate extent state and check the result properly

2011-04-12 Thread Xiao Guangrong
It doesn't allocate extent_state and check the result properly:
- in set_extent_bit, it doesn't allocate extent_state if the path is not
  allowed wait

- in clear_extent_bit, it doesn't check the result after atomic-ly allocate,
  we trigger BUG_ON() if it's fail

- if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since
  the return value of clear_extent_bit() is ignored by many callers

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c |   29 +
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 77c65a0..62d5bca 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -439,6 +439,16 @@ static int clear_state_bit(struct extent_io_tree *tree,
return ret;
 }
 
+static struct extent_state *
+alloc_extent_state_atomic(struct extent_state *prealloc)
+{
+   if (!prealloc)
+   prealloc = alloc_extent_state(GFP_ATOMIC);
+
+   BUG_ON(!prealloc);
+   return prealloc;
+}
+
 /*
  * clear some bits on a range in the tree.  This may require splitting
  * or inserting elements in the tree, so the gfp mask is used to
@@ -476,8 +486,7 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 
start, u64 end,
 again:
if (!prealloc  (mask  __GFP_WAIT)) {
prealloc = alloc_extent_state(mask);
-   if (!prealloc)
-   return -ENOMEM;
+   BUG_ON(!prealloc);
}
 
spin_lock(tree-lock);
@@ -529,8 +538,7 @@ hit_next:
 */
 
if (state-start  start) {
-   if (!prealloc)
-   prealloc = alloc_extent_state(GFP_ATOMIC);
+   prealloc = alloc_extent_state_atomic(prealloc);
err = split_state(tree, state, prealloc, start);
BUG_ON(err == -EEXIST);
prealloc = NULL;
@@ -551,8 +559,7 @@ hit_next:
 * on the first half
 */
if (state-start = end  state-end  end) {
-   if (!prealloc)
-   prealloc = alloc_extent_state(GFP_ATOMIC);
+   prealloc = alloc_extent_state_atomic(prealloc);
err = split_state(tree, state, prealloc, end + 1);
BUG_ON(err == -EEXIST);
if (wake)
@@ -716,8 +723,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, 
u64 end,
 again:
if (!prealloc  (mask  __GFP_WAIT)) {
prealloc = alloc_extent_state(mask);
-   if (!prealloc)
-   return -ENOMEM;
+   BUG_ON(!prealloc);
}
 
spin_lock(tree-lock);
@@ -734,6 +740,7 @@ again:
 */
node = tree_search(tree, start);
if (!node) {
+   prealloc = alloc_extent_state_atomic(prealloc);
err = insert_state(tree, prealloc, start, end, bits);
prealloc = NULL;
BUG_ON(err == -EEXIST);
@@ -802,6 +809,8 @@ hit_next:
err = -EEXIST;
goto out;
}
+
+   prealloc = alloc_extent_state_atomic(prealloc);
err = split_state(tree, state, prealloc, start);
BUG_ON(err == -EEXIST);
prealloc = NULL;
@@ -832,6 +841,8 @@ hit_next:
this_end = end;
else
this_end = last_start - 1;
+
+   prealloc = alloc_extent_state_atomic(prealloc);
err = insert_state(tree, prealloc, start, this_end,
   bits);
BUG_ON(err == -EEXIST);
@@ -856,6 +867,8 @@ hit_next:
err = -EEXIST;
goto out;
}
+
+   prealloc = alloc_extent_state_atomic(prealloc);
err = split_state(tree, state, prealloc, end + 1);
BUG_ON(err == -EEXIST);
 
-- 
1.7.4
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: fix unsafe usage of merge_state

2011-04-12 Thread Xiao Guangrong
merge_state can free the current state if it can be merged with the next node,
but in set_extent_bit(), after merge_state, we still use the current extent to
get the next node and cache it into cached_state

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c |   22 ++
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 62d5bca..40cb450 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -769,20 +769,18 @@ hit_next:
if (err)
goto out;
 
+   next_node = rb_next(node);
cache_state(state, cached_state);
merge_state(tree, state);
if (last_end == (u64)-1)
goto out;
 
start = last_end + 1;
-   if (start  end  prealloc  !need_resched()) {
-   next_node = rb_next(node);
-   if (next_node) {
-   state = rb_entry(next_node, struct extent_state,
-rb_node);
-   if (state-start == start)
-   goto hit_next;
-   }
+   if (next_node  start  end  prealloc  !need_resched()) {
+   state = rb_entry(next_node, struct extent_state,
+rb_node);
+   if (state-start == start)
+   goto hit_next;
}
goto search_again;
}
@@ -843,14 +841,22 @@ hit_next:
this_end = last_start - 1;
 
prealloc = alloc_extent_state_atomic(prealloc);
+
+   /*
+* Avoid to free 'prealloc' if it can be merged with
+* the later extent.
+*/
+   atomic_inc(prealloc-refs);
err = insert_state(tree, prealloc, start, this_end,
   bits);
BUG_ON(err == -EEXIST);
if (err) {
+   free_extent_state(prealloc);
prealloc = NULL;
goto out;
}
cache_state(prealloc, cached_state);
+   free_extent_state(prealloc);
prealloc = NULL;
start = this_end + 1;
goto search_again;
-- 
1.7.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrong values in df and btrfs filesystem df

2011-04-12 Thread Stephane Chazelas
2011-04-12 15:22:57 +0800, Miao Xie:
[...]
 But the algorithm of df command doesn't simulate the above allocation 
 correctly, this
 simulated allocation just allocates the stripes from two disks, and then, 
 these two disks
 have no free space, but the third disk still has 1.2TB free space, df command 
 thinks
 this space can be used to make a new RAID0 block group and ignores it. This 
 is a bug,
 I think.
[...]

Thanks a lot Miao for the detailed explanation. So, the disk
space is not lost, it's just df not reporting the available
space correctly. That's me relieved.

It explains why I'm getting:

# blockdev --getsize64 /dev/sda4
2967698087424
# blockdev --getsize64 /dev/sdb
3000592982016
# blockdev --getsize64 /dev/sdc
3000592982016
# truncate -s 2967698087424 a
# truncate -s 3000592982016 b
# truncate -s 3000592982016 c
# losetup /dev/loop0 ./a
# losetup /dev/loop1 ./b
# losetup /dev/loop2 ./c
# mkfs.btrfs a b c
# btrfs device scan /dev/loop[0-2]
Scanning for Btrfs filesystems in '/dev/loop0'
Scanning for Btrfs filesystems in '/dev/loop1'
Scanning for Btrfs filesystems in '/dev/loop2'
# mount  /dev/loop0 /mnt/1
# df -k /mnt/1
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/loop0   875867582856 5859474304   1% /mnt/1
# echo $(((8758675828 - 5859474304)*2**10))
2968782360576

One disk worth of space lost according to df.

While it should have been more something like
$(((3000592982016-2967698087424)*2)) (about 60GB), or about 0
after the quasi-round-robin allocation patch, right?

Best regards,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Btrfs: about chunk tree backups

2011-04-12 Thread WuBo

If no one has comments on this, I'll work on finishing it.

Thanks
Wubo

On 04/07/2011 03:57 PM, WuBo wrote:
 hi,all
 
 I've been diging into the idea of chunk tree backups. Here is the 
 predesign, before finishing chunk alloc, the first block in this 
 chunk will be written in some information, these information will be 
 useful for chunk tree rebuilding if crash, also the first block will 
 be moved into fs_info-freed_extents[2], just as the super block.
 what we should do is making some changes in these functions:
 btrfs_make_block_group
 btrfs_read_block_groups
 btrfs_remove_block_group  
 what do you think about it?
 
 There's something strait with backward compatibility. The mkfs.btrfs
 has been made several chunks when creating the fs. It also need to do 
 the same thing as above. But it will be confusing in some situations 
 such as old fs mount on new kernel. I think it's better to add a 
 incompat flag in super block to mark weather the fs is formaten with
 new mkfs.btrfs.
 
 if that's OK, TODOLIST:
 -design the information on chunk's first block to make it uniqueness
 -backward compatibility handle(for example:fix mkfs.btrfs)
 
 Signed-off-by: Wu Bo wu...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.h   |   13 +++-
  fs/btrfs/extent-tree.c |  135 +-
  fs/btrfs/volumes.c |  168 
 
  fs/btrfs/volumes.h |   25 +++
  4 files changed, 322 insertions(+), 19 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 8b4b9d1..580dd1c 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -41,6 +41,7 @@ extern struct kmem_cache *btrfs_transaction_cachep;
  extern struct kmem_cache *btrfs_bit_radix_cachep;
  extern struct kmem_cache *btrfs_path_cachep;
  struct btrfs_ordered_sum;
 +struct map_lookup;
  
  #define BTRFS_MAGIC _BHRfS_M
  
 @@ -408,6 +409,7 @@ struct btrfs_super_block {
  #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL(1ULL  1)
  #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS  (1ULL  2)
  #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO  (1ULL  3)
 +#define BTRFS_FEATURE_INCOMPAT_CHUNK_TREE_BACKUP (1ULL  4)
  
  #define BTRFS_FEATURE_COMPAT_SUPP0ULL
  #define BTRFS_FEATURE_COMPAT_RO_SUPP 0ULL
 @@ -415,7 +417,8 @@ struct btrfs_super_block {
   (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \
BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |\
BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |  \
 -  BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)
 +  BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |  \
 +  BTRFS_FEATURE_INCOMPAT_CHUNK_TREE_BACKUP)
  
  /*
   * A leaf is full of items. offset and size tell us where to find
 @@ -2172,10 +2175,12 @@ int btrfs_extent_readonly(struct btrfs_root *root, 
 u64 bytenr);
  int btrfs_free_block_groups(struct btrfs_fs_info *info);
  int btrfs_read_block_groups(struct btrfs_root *root);
  int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr);
 +
  int btrfs_make_block_group(struct btrfs_trans_handle *trans,
 -struct btrfs_root *root, u64 bytes_used,
 -u64 type, u64 chunk_objectid, u64 chunk_offset,
 -u64 size);
 +struct btrfs_root *root, struct map_lookup *map,
 +u64 bytes_used, u64 type, u64 chunk_objectid,
 +u64 chunk_offset, u64 size);
 +
  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 group_start);
  u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags);
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index f1db57d..27ea7d5 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -23,6 +23,7 @@
  #include linux/rcupdate.h
  #include linux/kthread.h
  #include linux/slab.h
 +#include linux/buffer_head.h
  #include compat.h
  #include hash.h
  #include ctree.h
 @@ -231,6 +232,113 @@ static int exclude_super_stripes(struct btrfs_root 
 *root,
   return 0;
  }
  
 +static int exclude_chunk_stripes_header_slow(struct btrfs_root *root,
 + struct btrfs_block_group_cache *cache)
 +{
 + int i;
 + int nr;
 + u64 devid;
 + u64 physical;
 + int stripe_len;
 + u64 stripe_num;
 + u64 *logical;
 + struct btrfs_path *path;
 + struct btrfs_key key;
 + struct btrfs_chunk *chunk;
 + struct btrfs_key found_key;
 + struct extent_buffer *leaf;
 + int ret;
 +
 + ret = 0;
 + path = btrfs_alloc_path();
 + if (!path)
 + return -1;
 +
 + root = root-fs_info-chunk_root;
 +
 + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
 + key.offset = cache-key.objectid;
 + key.type = BTRFS_CHUNK_ITEM_KEY;
 +
 + ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
 + if (ret != 0)
 + goto error;
 +
 + btrfs_item_key_to_cpu(path-nodes[0], 

[PATCH v2 1/3] btrfs: move btrfs_cmp_device_free_bytes to super.c

2011-04-12 Thread Arne Jansen
this function won't be used here anymore, so move it super.c where it is
used for df-calculation

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/super.c   |   25 +
 fs/btrfs/volumes.c |   13 -
 fs/btrfs/volumes.h |   15 ---
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 58e7de9..6f5c426 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -889,6 +889,31 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
return 0;
 }
 
+/* Used to sort the devices by max_avail(descending sort) */
+int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2)
+{
+   if (((struct btrfs_device_info *)dev_info1)-max_avail 
+   ((struct btrfs_device_info *)dev_info2)-max_avail)
+   return -1;
+   else if (((struct btrfs_device_info *)dev_info1)-max_avail 
+((struct btrfs_device_info *)dev_info2)-max_avail)
+   return 1;
+   else
+   return 0;
+}
+
+/*
+ * sort the devices by max_avail, in which max free extent size of each device
+ * is stored.(Descending Sort)
+ */
+static inline void btrfs_descending_sort_devices(
+   struct btrfs_device_info *devices,
+   size_t nr_devices)
+{
+   sort(devices, nr_devices, sizeof(struct btrfs_device_info),
+btrfs_cmp_device_free_bytes, NULL);
+}
+
 /*
  * The helper to calc the free space on the devices that can be used to store
  * file data.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8b9fb8c..a9f1fc2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2282,19 +2282,6 @@ static noinline u64 chunk_bytes_by_type(u64 type, u64 
calc_size,
return calc_size * num_stripes;
 }
 
-/* Used to sort the devices by max_avail(descending sort) */
-int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2)
-{
-   if (((struct btrfs_device_info *)dev_info1)-max_avail 
-   ((struct btrfs_device_info *)dev_info2)-max_avail)
-   return -1;
-   else if (((struct btrfs_device_info *)dev_info1)-max_avail 
-((struct btrfs_device_info *)dev_info2)-max_avail)
-   return 1;
-   else
-   return 0;
-}
-
 static int __btrfs_calc_nstripes(struct btrfs_fs_devices *fs_devices, u64 type,
 int *num_stripes, int *min_stripes,
 int *sub_stripes)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index cc2eada..b502f01 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -157,21 +157,6 @@ struct map_lookup {
struct btrfs_bio_stripe stripes[];
 };
 
-/* Used to sort the devices by max_avail(descending sort) */
-int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2);
-
-/*
- * sort the devices by max_avail, in which max free extent size of each device
- * is stored.(Descending Sort)
- */
-static inline void btrfs_descending_sort_devices(
-   struct btrfs_device_info *devices,
-   size_t nr_devices)
-{
-   sort(devices, nr_devices, sizeof(struct btrfs_device_info),
-btrfs_cmp_device_free_bytes, NULL);
-}
-
 int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start,
   u64 end, u64 *length);
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] btrfs: quasi-round-robin for chunk allocation

2011-04-12 Thread Arne Jansen
In a multi device setup, the chunk allocator currently always allocates
chunks on the devices in the same order. This leads to a very uneven
distribution, especially with RAID1 or RAID10 and an uneven number of
devices.
This patch always sorts the devices before allocating, and allocates the
stripes on the devices with the most available space, as long as there
is enough space available. In a low space situation, it first tries to
maximize striping.
The patch also simplifies the allocator and reduces the checks for
corner cases.
The simplification is done by several means. First, it defines the
properties of each RAID type upfront. These properties are used afterwards
instead of differentiating cases in several places.
Second, the old allocator defined a minimum stripe size for each block
group type, tried to find a large enough chunk, and if this fails just
allocates a smaller one. This is now done in one step. The largest possible
chunk (up to max_chunk_size) is searched and allocated.
Because we now have only one pass, the allocation of the map (struct
map_lookup) is moved down to the point where the number of stripes is
already known. This way we avoid reallocation of the map.
We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.

Changes from v1:
 - split into multiple parts
 - added some comments
 - generated with --patience for better readability

Arne Jansen (3):
  btrfs: move btrfs_cmp_device_free_bytes to super.c
  btrfs: heed alloc_start
  btrfs: quasi-round-robin for chunk allocation

 fs/btrfs/super.c   |   25 +++
 fs/btrfs/volumes.c |  492 ++--
 fs/btrfs/volumes.h |   16 +--
 3 files changed, 195 insertions(+), 338 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] btrfs: heed alloc_start

2011-04-12 Thread Arne Jansen
currently alloc_start is disregarded if the requested
chunk size is bigger than (device size - alloc_start),
but smaller than the device size.
The only situation where I see this could have made sense
was when a chunk equal the size of the device has been
requested. This was possible as the allocator failed to
take alloc_start into account when calculating the request
chunk size. As this gets fixed by this patch, the workaround
is not necessary anymore.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/volumes.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a9f1fc2..45c592a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -849,10 +849,7 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans,
/* we don't want to overwrite the superblock on the drive,
 * so we make sure to start at an offset of at least 1MB
 */
-   search_start = 1024 * 1024;
-
-   if (root-fs_info-alloc_start + num_bytes = search_end)
-   search_start = max(root-fs_info-alloc_start, search_start);
+   search_start = max(root-fs_info-alloc_start, 1024ull * 1024);
 
max_hole_start = search_start;
max_hole_size = 0;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] btrfs: quasi-round-robin for chunk allocation

2011-04-12 Thread Arne Jansen
In a multi device setup, the chunk allocator currently always allocates
chunks on the devices in the same order. This leads to a very uneven
distribution, especially with RAID1 or RAID10 and an uneven number of
devices.
This patch always sorts the devices before allocating, and allocates the
stripes on the devices with the most available space, as long as there
is enough space available. In a low space situation, it first tries to
maximize striping.
The patch also simplifies the allocator and reduces the checks for
corner cases.
The simplification is done by several means. First, it defines the
properties of each RAID type upfront. These properties are used afterwards
instead of differentiating cases in several places.
Second, the old allocator defined a minimum stripe size for each block
group type, tried to find a large enough chunk, and if this fails just
allocates a smaller one. This is now done in one step. The largest possible
chunk (up to max_chunk_size) is searched and allocated.
Because we now have only one pass, the allocation of the map (struct
map_lookup) is moved down to the point where the number of stripes is
already known. This way we avoid reallocation of the map.
We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/volumes.c |  474 +++-
 fs/btrfs/volumes.h |1 +
 2 files changed, 169 insertions(+), 306 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 45c592a..b309181 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2268,349 +2268,211 @@ static int btrfs_add_system_chunk(struct 
btrfs_trans_handle *trans,
return 0;
 }
 
-static noinline u64 chunk_bytes_by_type(u64 type, u64 calc_size,
-   int num_stripes, int sub_stripes)
-{
-   if (type  (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP))
-   return calc_size;
-   else if (type  BTRFS_BLOCK_GROUP_RAID10)
-   return calc_size * (num_stripes / sub_stripes);
-   else
-   return calc_size * num_stripes;
-}
-
-static int __btrfs_calc_nstripes(struct btrfs_fs_devices *fs_devices, u64 type,
-int *num_stripes, int *min_stripes,
-int *sub_stripes)
-{
-   *num_stripes = 1;
-   *min_stripes = 1;
-   *sub_stripes = 0;
-
-   if (type  (BTRFS_BLOCK_GROUP_RAID0)) {
-   *num_stripes = fs_devices-rw_devices;
-   *min_stripes = 2;
-   }
-   if (type  (BTRFS_BLOCK_GROUP_DUP)) {
-   *num_stripes = 2;
-   *min_stripes = 2;
-   }
-   if (type  (BTRFS_BLOCK_GROUP_RAID1)) {
-   if (fs_devices-rw_devices  2)
-   return -ENOSPC;
-   *num_stripes = 2;
-   *min_stripes = 2;
-   }
-   if (type  (BTRFS_BLOCK_GROUP_RAID10)) {
-   *num_stripes = fs_devices-rw_devices;
-   if (*num_stripes  4)
-   return -ENOSPC;
-   *num_stripes = ~(u32)1;
-   *sub_stripes = 2;
-   *min_stripes = 4;
-   }
-
-   return 0;
-}
-
-static u64 __btrfs_calc_stripe_size(struct btrfs_fs_devices *fs_devices,
-   u64 proposed_size, u64 type,
-   int num_stripes, int small_stripe)
-{
-   int min_stripe_size = 1 * 1024 * 1024;
-   u64 calc_size = proposed_size;
-   u64 max_chunk_size = calc_size;
-   int ncopies = 1;
-
-   if (type  (BTRFS_BLOCK_GROUP_RAID1 |
-   BTRFS_BLOCK_GROUP_DUP |
-   BTRFS_BLOCK_GROUP_RAID10))
-   ncopies = 2;
-
-   if (type  BTRFS_BLOCK_GROUP_DATA) {
-   max_chunk_size = 10 * calc_size;
-   min_stripe_size = 64 * 1024 * 1024;
-   } else if (type  BTRFS_BLOCK_GROUP_METADATA) {
-   max_chunk_size = 256 * 1024 * 1024;
-   min_stripe_size = 32 * 1024 * 1024;
-   } else if (type  BTRFS_BLOCK_GROUP_SYSTEM) {
-   calc_size = 8 * 1024 * 1024;
-   max_chunk_size = calc_size * 2;
-   min_stripe_size = 1 * 1024 * 1024;
-   }
-
-   /* we don't want a chunk larger than 10% of writeable space */
-   max_chunk_size = min(div_factor(fs_devices-total_rw_bytes, 1),
-max_chunk_size);
-
-   if (calc_size * num_stripes  max_chunk_size * ncopies) {
-   calc_size = max_chunk_size * ncopies;
-   do_div(calc_size, num_stripes);
-   do_div(calc_size, BTRFS_STRIPE_LEN);
-   calc_size *= BTRFS_STRIPE_LEN;
-   }
-
-   /* we don't want tiny stripes */
-   if (!small_stripe)
-   calc_size = max_t(u64, min_stripe_size, calc_size);
-
-   /*
-* we're about to do_div by the BTRFS_STRIPE_LEN so lets make sure
-* 

Re: [PATCH] mark internal functions static

2011-04-12 Thread Josef Bacik
On Tue, Apr 12, 2011 at 09:30:42AM +0800, Daniel J Blueman wrote:
 On 11 April 2011 23:45, Josef Bacik jo...@redhat.com wrote:
  On 04/11/2011 11:40 AM, Daniel J Blueman wrote:
 
  Hi Chris,
 
  This didn't make it in before, so updating to 2.6.39-rc2 and resending:
 
  Prevent needless exporting of internal functions from compilation
  units by marking them static.
 
 
  Looks like you have line wrapping on or something, the page looks mangled.
   Thanks,
 
 The only way I can solve this in gmail webmail is by attaching the patch.
 
 Is this acceptable? I guess if the mailing list strips patches, I
 guess using both may be a get-out-of-jail...


Try using git send-email, that works out well.  Thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: avoid taking the chunk_mutex in do_chunk_alloc

2011-04-12 Thread Josef Bacik
On Tue, Apr 12, 2011 at 09:33:03AM +0800, liubo wrote:
 On 04/12/2011 08:30 AM, Josef Bacik wrote:
  Everytime we try to allocate disk space we try and see if we can 
  pre-emptively
  allocate a chunk, but in the common case we don't allocate anything, so 
  there is
  no sense in taking the chunk_mutex at all.  So instead if we are allocating 
  a
  chunk, mark it in the space_info so we don't get two people trying to 
  allocate
  at the same time.  Thanks,
  
  Signed-off-by: Josef Bacik jo...@redhat.com
  ---
   fs/btrfs/ctree.h   |5 +++--
   fs/btrfs/extent-tree.c |   24 ++--
   2 files changed, 25 insertions(+), 4 deletions(-)
  
  diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
  index 0d00a07..a566780 100644
  --- a/fs/btrfs/ctree.h
  +++ b/fs/btrfs/ctree.h
  @@ -740,10 +740,11 @@ struct btrfs_space_info {
   */
  unsigned long reservation_progress;
   
  -   int full;   /* indicates that we cannot allocate any more
  +   int full:1; /* indicates that we cannot allocate any more
 chunks for this space */
  -   int force_alloc;/* set if we need to force a chunk alloc for
  +   int force_alloc:1;  /* set if we need to force a chunk alloc for
 this space */
  +   int chunk_alloc:1;  /* set if we are allocating a chunk */
   
  struct list_head list;
   
  diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
  index f619c3c..80c048f 100644
  --- a/fs/btrfs/extent-tree.c
  +++ b/fs/btrfs/extent-tree.c
  @@ -3020,6 +3020,7 @@ static int update_space_info(struct btrfs_fs_info 
  *info, u64 flags,
  found-bytes_may_use = 0;
  found-full = 0;
  found-force_alloc = 0;
  +   found-chunk_alloc = 0;
  *space_info = found;
  list_add_rcu(found-list, info-space_info);
  atomic_set(found-caching_threads, 0);
  @@ -3273,10 +3274,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
  *trans,
   {
  struct btrfs_space_info *space_info;
  struct btrfs_fs_info *fs_info = extent_root-fs_info;
  +   int wait_for_alloc = 0;
  int ret = 0;
   
  -   mutex_lock(fs_info-chunk_mutex);
  -
  flags = btrfs_reduce_alloc_profile(extent_root, flags);
   
  space_info = __find_space_info(extent_root-fs_info, flags);
  @@ -3287,6 +3287,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
  *trans,
  }
  BUG_ON(!space_info);
   
  +again:
  spin_lock(space_info-lock);
  if (space_info-force_alloc)
  force = 1;
  @@ -3299,9 +3300,27 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
  *trans,
alloc_bytes)) {
  spin_unlock(space_info-lock);
  goto out;
 
 hmm, the goto will lead to problems, cause in out clause there is a 
 mutex_unlock(), which
 we do not have a mutex_lock yet.


Hrm I wonder why xfstests didn't trip over that, thats what I get for patching
while watching the kid.  Thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid taking the chunk_mutex in do_chunk_alloc V2

2011-04-12 Thread Josef Bacik
Everytime we try to allocate disk space we try and see if we can pre-emptively
allocate a chunk, but in the common case we don't allocate anything, so there is
no sense in taking the chunk_mutex at all.  So instead if we are allocating a
chunk, mark it in the space_info so we don't get two people trying to allocate
at the same time.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
V1-V2: Return in the case where we don't need to allocate a chunk instead of
going to out.

 fs/btrfs/ctree.h   |5 +++--
 fs/btrfs/extent-tree.c |   29 -
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0d00a07..a566780 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -740,10 +740,11 @@ struct btrfs_space_info {
 */
unsigned long reservation_progress;
 
-   int full;   /* indicates that we cannot allocate any more
+   int full:1; /* indicates that we cannot allocate any more
   chunks for this space */
-   int force_alloc;/* set if we need to force a chunk alloc for
+   int force_alloc:1;  /* set if we need to force a chunk alloc for
   this space */
+   int chunk_alloc:1;  /* set if we are allocating a chunk */
 
struct list_head list;
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..362cc9b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3020,6 +3020,7 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
found-bytes_may_use = 0;
found-full = 0;
found-force_alloc = 0;
+   found-chunk_alloc = 0;
*space_info = found;
list_add_rcu(found-list, info-space_info);
atomic_set(found-caching_threads, 0);
@@ -3273,10 +3274,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
 {
struct btrfs_space_info *space_info;
struct btrfs_fs_info *fs_info = extent_root-fs_info;
+   int wait_for_alloc = 0;
int ret = 0;
 
-   mutex_lock(fs_info-chunk_mutex);
-
flags = btrfs_reduce_alloc_profile(extent_root, flags);
 
space_info = __find_space_info(extent_root-fs_info, flags);
@@ -3287,21 +3287,40 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
}
BUG_ON(!space_info);
 
+again:
spin_lock(space_info-lock);
if (space_info-force_alloc)
force = 1;
if (space_info-full) {
spin_unlock(space_info-lock);
-   goto out;
+   return 0;
}
 
if (!force  !should_alloc_chunk(extent_root, space_info,
  alloc_bytes)) {
spin_unlock(space_info-lock);
-   goto out;
+   return 0;
+   } else if (space_info-chunk_alloc) {
+   wait_for_alloc = 1;
+   } else {
+   space_info-chunk_alloc = 1;
}
spin_unlock(space_info-lock);
 
+   mutex_lock(fs_info-chunk_mutex);
+
+   /*
+* The chunk_mutex is held throughout the entirety of a chunk
+* allocation, so once we've acquired the chunk_mutex we know that the
+* other guy is done and we need to recheck and see if we should
+* allocate.
+*/
+   if (wait_for_alloc) {
+   mutex_unlock(fs_info-chunk_mutex);
+   wait_for_alloc = 0;
+   goto again;
+   }
+
/*
 * If we have mixed data/metadata chunks we want to make sure we keep
 * allocating mixed chunks instead of individual chunks.
@@ -3327,9 +3346,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
space_info-full = 1;
else
ret = 1;
+   space_info-chunk_alloc = 0;
space_info-force_alloc = 0;
spin_unlock(space_info-lock);
-out:
mutex_unlock(extent_root-fs_info-chunk_mutex);
return ret;
 }
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel BUG at fs/btrfs/inode.c:2281 4665

2011-04-12 Thread maria

The computer was idle when the first bug happened and after the reboot
btrfs can't be mounted. It can't delete orphans and replay the log.
It would be nice if I can get the data out, there is nothing important
there but it would be nice. I was actually fixing typos in the backup
scripts when the bug attacked :)

I have written this by hand so there may be typos. I skipped the parts
that I don't think is necessary, the rest is on paper if I missed
something important.

kernel 2.6.39-rc2

Kernel BUG at fs/btrfs/inode.c:2281
EIP is at btrfs_orphan_del+0xa8/0xc1
Call Trace:
btrfs_orphan_cleanup+0x18b/0x2a5
btrfs_lookup_dentry+0x32f/0357
btrfs_lookup+0xb/0x22
d_albc_and_lookup+0x38/0x4f
walk_component+0x131/0x2b0
? btrfs_getxattr+0x2f/0x5b
path_lookupat+0x9a/0x2af
do_path_lookup+0x33/0x8b
user_path_at+0x3b/0x61
? putname+0x25/0x2e
? putname+0x25/0x2e
? user_path_at+0x44/0x61
vfs_fstatat+0x51/0x78
vfs_lstat+0x16/0x18
sys_lstat64+0x14/0x28
? vfs_mount_lock_local_unlock+0x20/0x2b
? mntput_no_expire+0x53/0x110
? mntput+0x19/0x1b
? path_put+0x15/0x18
? sys_getxattr+0x3f/0x4c
sysenter_do_call+0x12/0x22

Kernel BUG at fs/btrfs/inode.c:4665
EIP is at btrfs_add_link+0x11f/0x188
Call Trace:
add_inode_ref+0x226/0x29d
? __kmap_atomic+0xe/0x10
replay_one_buffer+0x165/0x1db
walk_down_log_tree+0x155/0x2ac
walk_log_tree+0x63/0x162
? _raw_spin_unlock+0x14/0x1f
btrfs_recover_log_trees+0x15c/0x23e
? replay_one_extent+0x518/0x518
open_ctree+0xe77/0x1102
? strcpy+0x13/0x2e
btrfs_mount+0x2ab/0x622
? ida_get_new_above+0x14c/0x166
mount_fs+0xe/0x95
vfs_kern_mount+0x4c/0x79
do_kern_mount+0x2f/0xae
? notify_page_fault+0x5f/0x5f
? copy_mount_options+0x73/0xd2
sys_mount+0x61/0x8f
sysenter_do_call+0x12/0x22

// Maria


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: avoid taking the chunk_mutex in do_chunk_alloc

2011-04-12 Thread David Sterba
Hi,

On Mon, Apr 11, 2011 at 08:30:24PM -0400, Josef Bacik wrote:
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 0d00a07..a566780 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -740,10 +740,11 @@ struct btrfs_space_info {
*/
   unsigned long reservation_progress;
  
 - int full;   /* indicates that we cannot allocate any more
 + int full:1; /* indicates that we cannot allocate any more
  chunks for this space */
 - int force_alloc;/* set if we need to force a chunk alloc for
 + int force_alloc:1;  /* set if we need to force a chunk alloc for
  this space */
 + int chunk_alloc:1;  /* set if we are allocating a chunk */
  
   struct list_head list;
  

please make the bitfields unsigned.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/8] btrfs: Balance progress monitoring

2011-04-12 Thread David Sterba
Hi,

I've noticed that Arne's scrub patches add scrub variables directly
into the fs_info structure, while you have a separate struct.

I was wondering whether it would be better to put items of
btrfs_balance_info to fs_info too, balance state is a global info.

Although fs_info is a huge structure now, 9402 bytes on 86_64, there is
no space saving in this case.

On Sun, Apr 10, 2011 at 10:06:04PM +0100, Hugo Mills wrote:
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 7f78cc7..17c7ecc 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -865,6 +865,11 @@ struct btrfs_block_group_cache {
   struct list_head cluster_list;
  };
  
 +struct btrfs_balance_info {
 + u32 expected;
 + u32 completed;

two u32 make one u64

 +};
 +
  struct reloc_control;
  struct btrfs_device;
  struct btrfs_fs_devices;
 @@ -1078,6 +1083,10 @@ struct btrfs_fs_info {
  
   /* filesystem state */
   u64 fs_state;
 +
 + /* Keep track of any rebalance operations on this FS */
 + spinlock_t balance_info_lock;
 + struct btrfs_balance_info *balance_info;

a pointer is a u64 too

  };
  
  /*
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index dd13eb8..bb2ffed 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2051,6 +2052,20 @@ int btrfs_balance(struct btrfs_root *dev_root)
   mutex_lock(dev_root-fs_info-volume_mutex);
   dev_root = dev_root-fs_info-dev_root;
  
 + bal_info = kmalloc(
 + sizeof(struct btrfs_balance_info),
 + GFP_NOFS);

... drop

 + if (!bal_info) {
 + ret = -ENOMEM;
 + goto error_no_status;
 + }
 + spin_lock(dev_root-fs_info-balance_info_lock);
 + dev_root-fs_info-balance_info = bal_info;
 + bal_info-expected = -1; /* One less than actually counted,
 + because chunk 0 is special */
 + bal_info-completed = 0;
 + spin_unlock(dev_root-fs_info-balance_info_lock);
 +
   /* step one make some room on all the devices */
   list_for_each_entry(device, devices, dev_list) {
   old_size = device-total_bytes;
 @@ -2115,10 +2157,20 @@ int btrfs_balance(struct btrfs_root *dev_root)
  found_key.offset);
   BUG_ON(ret  ret != -ENOSPC);
   key.offset = found_key.offset - 1;
 + spin_lock(dev_root-fs_info-balance_info_lock);
 + bal_info-completed++;
 + spin_unlock(dev_root-fs_info-balance_info_lock);
 + printk(KERN_INFO btrfs: balance: %llu/%llu block groups 
 completed\n,
 +bal_info-completed, bal_info-expected);
   }
   ret = 0;
  error:
   btrfs_free_path(path);
 + spin_lock(dev_root-fs_info-balance_info_lock);
 + kfree(dev_root-fs_info-balance_info);

... drop

 + dev_root-fs_info-balance_info = NULL;
 + spin_unlock(dev_root-fs_info-balance_info_lock);
 +error_no_status:
   mutex_unlock(dev_root-fs_info-volume_mutex);
   return ret;
  }
 -- 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel BUG at fs/btrfs/inode.c:2281 4665

2011-04-12 Thread Peter Stuge
Hej Maria!

ma...@ponstudios.se wrote:
 The computer was idle when the first bug happened

Ouch.


 and after the reboot btrfs can't be mounted.

Can you get some messages out of btrfs tools run against the file
system when offline, or an image of it? Maybe hook the fs up to
another machine.


//Peter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/8] btrfs: Balance progress monitoring

2011-04-12 Thread Hugo Mills
On Tue, Apr 12, 2011 at 07:12:32PM +0200, David Sterba wrote:
 Hi,
 
 I've noticed that Arne's scrub patches add scrub variables directly
 into the fs_info structure, while you have a separate struct.

   Chris (I think -- might have been Josef) suggested the use of a
struct, back when I was first writing this code.

 I was wondering whether it would be better to put items of
 btrfs_balance_info to fs_info too, balance state is a global info.
 
 Although fs_info is a huge structure now, 9402 bytes on 86_64, there is
 no space saving in this case.

   There will be savings in the future, however -- when I add Li's
suggestion for tracking the number of bytes (in the block groups as a
whole, and in terms of useful data stored), plus the vaddr of the
last-moved block group, the size of the btrfs_balance_info struct will
go up from its current 8 bytes to 48. I've just not quite finished
that patch yet, and wanted to get the rest of the patches settled
while I work on the new one...

   Hugo.

 On Sun, Apr 10, 2011 at 10:06:04PM +0100, Hugo Mills wrote:
  diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
  index 7f78cc7..17c7ecc 100644
  --- a/fs/btrfs/ctree.h
  +++ b/fs/btrfs/ctree.h
  @@ -865,6 +865,11 @@ struct btrfs_block_group_cache {
  struct list_head cluster_list;
   };
   
  +struct btrfs_balance_info {
  +   u32 expected;
  +   u32 completed;
 
 two u32 make one u64
 
  +};
  +
   struct reloc_control;
   struct btrfs_device;
   struct btrfs_fs_devices;
  @@ -1078,6 +1083,10 @@ struct btrfs_fs_info {
   
  /* filesystem state */
  u64 fs_state;
  +
  +   /* Keep track of any rebalance operations on this FS */
  +   spinlock_t balance_info_lock;
  +   struct btrfs_balance_info *balance_info;
 
 a pointer is a u64 too
 
   };
   
   /*
  diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
  index dd13eb8..bb2ffed 100644
  --- a/fs/btrfs/volumes.c
  +++ b/fs/btrfs/volumes.c
  @@ -2051,6 +2052,20 @@ int btrfs_balance(struct btrfs_root *dev_root)
  mutex_lock(dev_root-fs_info-volume_mutex);
  dev_root = dev_root-fs_info-dev_root;
   
  +   bal_info = kmalloc(
  +   sizeof(struct btrfs_balance_info),
  +   GFP_NOFS);
 
 ... drop
 
  +   if (!bal_info) {
  +   ret = -ENOMEM;
  +   goto error_no_status;
  +   }
  +   spin_lock(dev_root-fs_info-balance_info_lock);
  +   dev_root-fs_info-balance_info = bal_info;
  +   bal_info-expected = -1; /* One less than actually counted,
  +   because chunk 0 is special */
  +   bal_info-completed = 0;
  +   spin_unlock(dev_root-fs_info-balance_info_lock);
  +
  /* step one make some room on all the devices */
  list_for_each_entry(device, devices, dev_list) {
  old_size = device-total_bytes;
  @@ -2115,10 +2157,20 @@ int btrfs_balance(struct btrfs_root *dev_root)
 found_key.offset);
  BUG_ON(ret  ret != -ENOSPC);
  key.offset = found_key.offset - 1;
  +   spin_lock(dev_root-fs_info-balance_info_lock);
  +   bal_info-completed++;
  +   spin_unlock(dev_root-fs_info-balance_info_lock);
  +   printk(KERN_INFO btrfs: balance: %llu/%llu block groups 
  completed\n,
  +  bal_info-completed, bal_info-expected);
  }
  ret = 0;
   error:
  btrfs_free_path(path);
  +   spin_lock(dev_root-fs_info-balance_info_lock);
  +   kfree(dev_root-fs_info-balance_info);
 
 ... drop
 
  +   dev_root-fs_info-balance_info = NULL;
  +   spin_unlock(dev_root-fs_info-balance_info_lock);
  +error_no_status:
  mutex_unlock(dev_root-fs_info-volume_mutex);
  return ret;
   }

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Never underestimate the bandwidth of a Volvo filled ---   
   with backup tapes.


signature.asc
Description: Digital signature


Re: Kernel BUG at fs/btrfs/inode.c:2281 4665

2011-04-12 Thread Maria Wikström
On tis, 2011-04-12 at 09:29 -0600, cwillu wrote:
 On Tue, Apr 12, 2011 at 9:11 AM,  ma...@ponstudios.se wrote:
 
  The computer was idle when the first bug happened and after the reboot
  btrfs can't be mounted. It can't delete orphans and replay the log.
  It would be nice if I can get the data out, there is nothing important
  there but it would be nice. I was actually fixing typos in the backup
  scripts when the bug attacked :)
 
 There's a btrfs-zero-log in the progs-unstable git repository that
 will probably get you up and running again.  You have to build it
 manually via make btrfs-zero-log.

Thanks! It worked, I got the data out :)

I found the one who has been crashing the computer every now and then
and has probably been corrupting btrfs. scsi_lib.c:1147
So I don't think this is btrfs fault, more likely faulty hardware.

  I have written this by hand so there may be typos. I skipped the parts
  that I don't think is necessary, the rest is on paper if I missed
  something important.
 
 Digital cameras are wonderful things :p

When they have charged batteries ;)

// Maria


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel BUG at fs/btrfs/inode.c:2281 4665

2011-04-12 Thread Maria Wikström
On tis, 2011-04-12 at 20:20 +0200, Peter Stuge wrote:
 Hej Maria!
 
 ma...@ponstudios.se wrote:
  The computer was idle when the first bug happened
 
 Ouch.
 
 
  and after the reboot btrfs can't be mounted.
 
 Can you get some messages out of btrfs tools run against the file
 system when offline, or an image of it? Maybe hook the fs up to
 another machine.
 
 
 //Peter
 

I got the data out after I cleared the log with btrfs-zero-log.

// Maria


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Re: btrfs does not work on usermode linux

2011-04-12 Thread Sergei Trofimovich
On Mon, 11 Apr 2011 15:50:48 -0400
Josef Bacik jo...@redhat.com wrote:

 On 04/11/2011 03:44 PM, Sergei Trofimovich wrote:
  Fix data corruption caused by memcpy() usage on overlapping data.
  I've observed it first when found out usermode linux crash on btrfs.

...

 Fair enough, BUG_ON() it is.  Repost that version and you can add my
 
 Reviewed-by: Josef Bacik jo...@redhat.com

Thank you! Added and resent as:
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09357.html

-- 

  Sergei


signature.asc
Description: PGP signature


Re: Unable to perform online resize

2011-04-12 Thread Jan Steffens
On Wed, Apr 13, 2011 at 12:15 AM, Ali Lown a...@lown.me.uk wrote:
 I am using btrfs-progs latest git head in gentoo, along with a 2.6.37 kernel.
 I am unable to resize the filesystem online (or offline - though that
 doesn't seem to be an option).

 with brtfs:
 ---
 alipc-gentoo% sudo btrfs filesystem resize +20g ~/.wine/drive_c
 Resize '/home/ali/.wine/drive_c' of '+20g'
 ERROR: unable to resize '/home/ali/.wine/drive_c'
 ---

 with btrfs-ctl:
 ---
 alipc-gentoo% sudo btrfsctl -r +20g ~/.wine/drive_c
 ioctl:: File too large
 ---

 fdisk -l of the the drive that partiion is on (sda9 = ~/.wine/drive_c):
 ---
 Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
 Units = sectors of 1 * 512 = 512 bytes
 Sector size (logical/physical): 512 bytes / 512 bytes
 I/O size (minimum/optimal): 512 bytes / 512 bytes
 Disk identifier: 0x000e3826
    Device Boot      Start         End      Blocks   Id  System
 /dev/sda1              63  1953520064   976760001    5  Extended
 /dev/sda5      1009797705  1953520064   471861180   83  Linux
 /dev/sda6       642391218  1009797704   183703243+  83  Linux
 /dev/sda7             189    41945714    20972763    7  HPFS/NTFS
 /dev/sda8        41945778   104856254    31455238+  83  Linux
 /dev/sda9       104856318   209712509    52428096   83  Linux
 /dev/sda10      386250858   468166229    40957686   83  Linux
 Partition table entries are not in disk order
 ---
 For help in comprehending this, here is an image from gparted:
 http://img833.imageshack.us/i/gpartedd.png/

 Apart from the hideous numbering and arrangement - I used gparted, and
 made it up gradually, as and when I needed partitions.
 In short for sda9, it is a logical partition inside an extended
 partion (sda1), and has approximately 85GB of unpartioned space
 (inside the extended partition) following it, into which I would like
 to resize it.
 Thanks.
 -Ali
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs resizing (like resize2fs) does not manipulate the size of partitions.

First, you need to expand the partition using fdisk or gparted.
Afterwards you can use btrfs filesystem resize max ~/.wine/drive_c
to grow the filesystem to the size of the partition.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html