Re: 2.6.39-rc1: btrfs "WARNING: at fs/btrfs/inode.c:2177"

2011-04-07 Thread Jeff Wu

Hi ,
I applied the patch to 2.6.39-rc1,took the following steps to compile
it:make && make modules_install && make install && mkinitramfs 
but , it seam that it don't run to "WARN_ON(block_rsv ==
root->orphan_block_rsv);"

i attached the codes and logs at the below:



int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_block_rsv *block_rsv,
u64 num_bytes)
{
int ret;

WARN_ON(block_rsv == root->orphan_block_rsv);
if (num_bytes == 0)
return 0;

ret = reserve_metadata_bytes(trans, root, block_rsv, num_bytes,
1);
if (!ret) {
block_rsv_add_bytes(block_rsv, num_bytes, 1);
return 0;
}

return ret;
}
..




1.log1

..
[  147.740003] CE: hpet5 increased min_delta_ns to 7500 nsec
[  147.740012] CE: hpet5 increased min_delta_ns to 11250 nsec
[  148.520005] CE: hpet4 increased min_delta_ns to 7500 nsec
[  148.520012] CE: hpet4 increased min_delta_ns to 11250 nsec
[ 2561.740727] [ cut here ]
[ 2561.740746] WARNING: at fs/btrfs/inode.c:2177
btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[ 2561.740748] Hardware name: OptiPlex 780 
[ 2561.740750] Modules linked in: i915 btrfs fbcon tileblit font
snd_hda_codec_analog bitblit softcursor drm_kms_helper drm snd_hda_intel
snd_hda_codec snd_hwdep snd_pcm zlib_deflate crc32c snd_timer libcrc32c
snd psmouse i2c_algo_bit ppdev intel_agp parport_pc soundcore lp
intel_gtt snd_page_alloc video dell_wmi parport serio_raw sparse_keymap
r8169 mii ata_piix
[ 2561.740781] Pid: 570, comm: btrfs-transacti Not tainted 2.6.39-rc1 #2
[ 2561.740783] Call Trace:
[ 2561.740789]  [] warn_slowpath_common+0x7f/0xc0
[ 2561.740793]  [] warn_slowpath_null+0x1a/0x20
[ 2561.740803]  [] btrfs_orphan_commit_root+0xb0/0xc0
[btrfs]
[ 2561.740813]  [] commit_fs_roots+0xa9/0x150 [btrfs]
[ 2561.740824]  [] btrfs_commit_transaction
+0x34b/0x750 [btrfs]
[ 2561.740828]  [] ? wake_up_bit+0x40/0x40
[ 2561.740838]  [] transaction_kthread+0x283/0x290
[btrfs]
[ 2561.740848]  [] ? btrfs_bio_wq_end_io+0x90/0x90
[btrfs]
[ 2561.740851]  [] kthread+0xb6/0xc0
[ 2561.740854]  [] ? trace_hardirqs_on_caller
+0x13d/0x180
[ 2561.740859]  [] kernel_thread_helper+0x4/0x10
[ 2561.740862]  [] ? retint_restore_args+0x13/0x13
[ 2561.740865]  [] ? __init_kthread_worker+0x70/0x70
[ 2561.740868]  [] ? gs_change+0x13/0x13
[ 2561.740870] ---[ end trace c68c126da4200e73 ]---
[ 2655.461017] 
[ 2655.461019] =
[ 2655.467908] [ INFO: possible recursive locking detected ]
[ 2655.470882] 2.6.39-rc1 #2
[ 2655.470882] -
[ 2655.470882] cosd/2420 is trying to acquire lock:
[ 2655.470882]  (&(&eb->lock)->rlock){+.+...}, at: []
btrfs_try_spin_lock+0x59/0x100 [btrfs]
[ 2655.470882] 
[ 2655.470882] but task is already holding lock:
[ 2655.470882]  (&(&eb->lock)->rlock){+.+...}, at: []
btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
[ 2655.470882] 
[ 2655.470882] other info that might help us debug this:
[ 2655.470882] 2 locks held by cosd/2420:
[ 2655.470882]  #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at:
[] do_last+0x2f5/0x8a0
[ 2655.470882]  #1:  (&(&eb->lock)->rlock){+.+...}, at:
[] btrfs_clear_lock_blocking+0x22/0x30 [btrfs]
[ 2655.470882] 
[ 2655.470882] stack backtrace:
[ 2655.470882] Pid: 2420, comm: cosd Tainted: GW   2.6.39-rc1 #2
[ 2655.470882] Call Trace:
[ 2655.470882]  [] __lock_acquire+0x1154/0x14b0
[ 2655.470882]  [] lock_acquire+0xa0/0x150
[ 2655.470882]  [] ? btrfs_try_spin_lock+0x59/0x100
[btrfs]
[ 2655.470882]  [] _raw_spin_lock+0x31/0x40
[ 2655.470882]  [] ? btrfs_try_spin_lock+0x59/0x100
[btrfs]
[ 2655.470882]  [] ? btrfs_clear_lock_blocking
+0x22/0x30 [btrfs]
[ 2655.470882]  [] btrfs_try_spin_lock+0x59/0x100
[btrfs]
[ 2655.470882]  [] btrfs_search_slot+0x7d2/0x850
[btrfs]
[ 2655.470882]  [] btrfs_lookup_dir_item+0x82/0x110
[btrfs]
[ 2655.470882]  [] ? kmem_cache_alloc+0xe5/0x140
[ 2655.470882]  [] btrfs_lookup_dentry+0xa1/0x4b0
[btrfs]
[ 2655.470882]  [] ? d_alloc+0x141/0x1e0
[ 2655.470882]  [] ? trace_hardirqs_on+0xd/0x10
[ 2655.470882]  [] ? do_raw_spin_unlock+0x5e/0xb0
[ 2655.470882]  [] btrfs_lookup+0x16/0x30 [btrfs]
[ 2655.470882]  [] d_alloc_and_lookup+0x45/0x90
[ 2655.470882]  [] ? d_lookup+0x35/0x60
[ 2655.470882]  [] __lookup_hash+0xde/0x180
[ 2655.470882]  [] do_last+0x305/0x8a0
[ 2655.470882]  [] path_openat+0xcd/0x3f0
[ 2655.470882]  [] do_filp_open+0x7f/0xa0
[ 2655.470882]  [] ? _raw_spin_unlock+0x2b/0x40
[ 2655.470882]  [] ? alloc_fd+0xfa/0x140
[ 2655.470882]  [] do_sys_open+0x104/0x1e0
[ 2655.470882]  [] sys_open+0x20/0x30
[ 2655.470882]  [] system_call_fastpath+0x16/0x1b

2.log2

..

Re: 2.6.29-rc2 oops and assertion failure...

2011-04-07 Thread Daniel J Blueman
Hi Josef, Chris,

On 8 April 2011 00:23, Josef Bacik  wrote:
> On 04/07/2011 03:21 AM, Daniel J Blueman wrote:
>>
>> When running a practical stress-test on 2.6.29-rc2 trying to reproduce
>> an older (extent refcounting) issue, I am consistently able to hit an
>> oops [] and an assertion failure [].
>
> Sorry about that, please apply the patch I just sent this morning
>
> [PATCH] Btrfs: deal with the case that we run out of space in the cache

Superb work - the btrfs_write_out_cache oops is addressed, so now we
(separately) hit a few other assertions at: volumes.c:2013 [1],
volumes.c:2063 [2] and volumes.c:2703 [3] with the previous
reproducer.

Let me know if adding any debugging or other testing may be useful.

Thanks,
  Daniel

--- [1]

kernel BUG at fs/btrfs/volumes.c:2013!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/block/ram7/removable
CPU 0
Modules linked in: ppp_generic slhc tun brd loop

Pid: 17040, comm: btrfs Tainted: GW   2.6.39-rc2-350cd+ #3
Supermicro X8STi/X8STi
RIP: 0010:[]  [] btrfs_balance+0x27b/0x280
RSP: 0018:88015c923e08  EFLAGS: 00010282
RAX: fffb RBX: 880301d6e1b0 RCX: 0040
RDX: fffb RSI:  RDI: 8112e425
RBP: 88015c923e88 R08:  R09: 8802f8ee53f0
R10: 0012 R11: 0098 R12: 8802f909a490
R13: 8802f909bc38 R14: 1000 R15: 7fffd1599ce0
FS:  7f3c4b6f4740() GS:88031fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00f00098 CR3: 00015c921000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 17040, threadinfo 88015c922000, task 88030b898000)
Stack:
 880307cd5498 880301d6c120 88015c923e38 81085b9e
 880308a5d700 0008 88015c923f48 81031d5c
 ea000a9e7b40 88015c923f58 88030b898000 88015c8aa300
Call Trace:
 [] ? up_read+0x1e/0x40
 [] ? do_page_fault+0x1cc/0x440
 [] btrfs_ioctl+0x450/0x590
 [] do_vfs_ioctl+0x8d/0x330
 [] ? fget_light+0x274/0x3c0
 [] ? __do_fault+0x150/0x5d0
 [] sys_ioctl+0x4a/0x80
 [] system_call_fastpath+0x16/0x1b
Code: 81 c7 d8 22 00 00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0
eb d2 85 c0 74 a7 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f>
0b eb fe 90 55 48 89 e5 48 83 ec 40 8b 05 e2 62 72 00 4c 89
RIP  [] btrfs_balance+0x27b/0x280
 RSP 

--- [2]

kernel BUG at fs/btrfs/volumes.c:2063!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/block/ram7/removable
CPU 0
Modules linked in: brd loop

Pid: 13460, comm: btrfs Tainted: GW   2.6.39-rc2-350cd+ #3
Supermicro X8STi/X8STi
RIP: 0010:[]  [] btrfs_balance+0x26b/0x280
RSP: 0018:8800b1827e08  EFLAGS: 00010282
RAX: fffb RBX: 88030934d168 RCX: 0006
RDX: fffb RSI: 880308fc06f0 RDI: 880308fc
RBP: 8800b1827e88 R08:  R09: 
R10:  R11:  R12: 8802ff5455e8
R13: 8800b1827e38 R14: 00010d56 R15: 8800b1827e18
FS:  7fce737e5740() GS:88031fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 02371688 CR3: b1ff8000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 13460, threadinfo 8800b1826000, task 880308fc)
Stack:
 0100 88030934e1b0 0100 010d56e4
 880308837a00 0008 0100 0113bbe4
 880308fc0600 8800b1827f58 880308fc 8801f8c56c00
Call Trace:
 [] btrfs_ioctl+0x450/0x590
 [] do_vfs_ioctl+0x8d/0x330
 [] ? fget_light+0x2bf/0x3c0
 [] ? trace_hardirqs_on_caller+0x14d/0x190
 [] sys_ioctl+0x4a/0x80
 [] system_call_fastpath+0x16/0x1b
Code: 7c 90 fb ff 48 8b 55 88 48 8b ba 58 01 00 00 48 81 c7 d8 22 00
00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0 eb d2 85 c0 74 a7 <0f>
0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 90
RIP  [] btrfs_balance+0x26b/0x280
 RSP 

--- [3]

kernel BUG at fs/btrfs/volumes.c:2703!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/bdi/btrfs-3/uevent
CPU 0
Modules linked in: brd loop

Pid: 14333, comm: btrfs-delalloc- Tainted: GW
2.6.39-rc2-350cd+ #3 Supermicro X8STi/X8STi
RIP: 0010:[]  []
__finish_chunk_alloc+0x212/0x220
RSP: 0018:8803007e7af0  EFLAGS: 00010286
RAX: ffe4 RBX: 88024e54e000 RCX: 0040
RDX:  RSI:  RDI: 8112e425
RBP: 8803007e7b70 R08:  R09: 8803072fe168
R10: 0012 R11: 0098 R12: 880303c192a8
R13: 88020a461e7

Re: [PATCH v4 1/8] btrfs: Balance progress monitoring

2011-04-07 Thread Li Zefan
01:06, Hugo Mills wrote:
> This patch introduces a basic form of progress monitoring for balance
> operations, by counting the number of block groups remaining. The
> information is exposed to userspace by an ioctl.
> 
> Signed-off-by: Hugo Mills 
> ---
>  fs/btrfs/ctree.h   |9 +++
>  fs/btrfs/disk-io.c |2 +
>  fs/btrfs/ioctl.c   |   34 +
>  fs/btrfs/ioctl.h   |7 ++
>  fs/btrfs/volumes.c |   61 ++-
>  5 files changed, 111 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 7f78cc7..6c5526c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -865,6 +865,11 @@ struct btrfs_block_group_cache {
>   struct list_head cluster_list;
>  };
>  
> +struct btrfs_balance_info {
> + u64 expected;
> + u64 completed;
> +};
> +

Those can be u32.

And how about also count the total size and used size of all the chunks ?

>  struct reloc_control;
>  struct btrfs_device;
>  struct btrfs_fs_devices;
> @@ -1078,6 +1083,10 @@ struct btrfs_fs_info {
>  
>   /* filesystem state */
>   u64 fs_state;
> +
> + /* Keep track of any rebalance operations on this FS */
> + spinlock_t balance_info_lock;
> + struct btrfs_balance_info *balance_info;
>  };
>  
>  /*
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 100b07f..3d690de 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1645,6 +1645,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>   spin_lock_init(&fs_info->ref_cache_lock);
>   spin_lock_init(&fs_info->fs_roots_radix_lock);
>   spin_lock_init(&fs_info->delayed_iput_lock);
> + spin_lock_init(&fs_info->balance_info_lock);
>  
>   init_completion(&fs_info->kobj_unregister);
>   fs_info->tree_root = tree_root;
> @@ -1670,6 +1671,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>   fs_info->sb = sb;
>   fs_info->max_inline = 8192 * 1024;
>   fs_info->metadata_ratio = 0;
> + fs_info->balance_info = NULL;
>  
>   fs_info->thread_pool_size = min_t(unsigned long,
> num_online_cpus() + 2, 8);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 5fdb2ab..a8fbb07 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2375,6 +2375,38 @@ static noinline long btrfs_ioctl_wait_sync(struct file 
> *file, void __user *argp)
>   return btrfs_wait_for_commit(root, transid);
>  }
>  
> +/*
> + * Return the current status of any balance operation
> + */
> +long btrfs_ioctl_balance_progress(
> + struct btrfs_fs_info *fs_info,
> + struct btrfs_ioctl_balance_progress __user *user_dest)
> +{
> + int ret = 0;
> + struct btrfs_ioctl_balance_progress dest;
> +
> + spin_lock(&fs_info->balance_info_lock);
> + if (!fs_info->balance_info) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + dest.expected = fs_info->balance_info->expected;
> + dest.completed = fs_info->balance_info->completed;
> +
> + spin_unlock(&fs_info->balance_info_lock);
> +
> + if (copy_to_user(user_dest, &dest,
> +  sizeof(struct btrfs_ioctl_balance_progress)))
> + return -EFAULT;
> +
> + return 0;
> +
> +error:
> + spin_unlock(&fs_info->balance_info_lock);
> + return ret;
> +}
> +
>  long btrfs_ioctl(struct file *file, unsigned int
>   cmd, unsigned long arg)
>  {
> @@ -2414,6 +2446,8 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return btrfs_ioctl_rm_dev(root, argp);
>   case BTRFS_IOC_BALANCE:
>   return btrfs_balance(root->fs_info->dev_root);
> + case BTRFS_IOC_BALANCE_PROGRESS:
> + return btrfs_ioctl_balance_progress(root->fs_info, argp);
>   case BTRFS_IOC_CLONE:
>   return btrfs_ioctl_clone(file, arg, 0, 0, 0);
>   case BTRFS_IOC_CLONE_RANGE:
> diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
> index 8fb3821..4c82d40 100644
> --- a/fs/btrfs/ioctl.h
> +++ b/fs/btrfs/ioctl.h
> @@ -157,6 +157,11 @@ struct btrfs_ioctl_space_args {
>   struct btrfs_ioctl_space_info spaces[0];
>  };
>  
> +struct btrfs_ioctl_balance_progress {
> + __u64 expected;
> + __u64 completed;
> +};
> +
>  #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
>  struct btrfs_ioctl_vol_args)
>  #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
> @@ -203,4 +208,6 @@ struct btrfs_ioctl_space_args {
>  struct btrfs_ioctl_vol_args_v2)
>  #define BTRFS_IOC_SUBVOL_GETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 25, __u64)
>  #define BTRFS_IOC_SUBVOL_SETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 26, __u64)
> +#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 27, \
> +   struct btrfs_ioctl_balance_progress)
>  #endif
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dd13eb8..2bd4565 100644
> --- a/fs/b

Re: [PATCH v4 3/8] btrfs: Factor out enumeration of chunks to a separate function

2011-04-07 Thread Josef Bacik

On 04/07/2011 01:06 PM, Hugo Mills wrote:

The main balance function has two loops which are functionally
identical in their looping mechanism, but which perform a different
operation on the chunks they loop over. To avoid repeating code more
than necessary, factor this loop out into a separate iterator function
which takes a function parameter for the action to be performed.

Signed-off-by: Hugo Mills
---
  fs/btrfs/volumes.c |  179 +---
  1 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5378b94..ffba817 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2029,6 +2029,97 @@ static u64 div_factor(u64 num, int factor)
return num;
  }

+/* Define a type, and two functions which can be used for the two
+ * phases of the balance operation: one for counting chunks, and one
+ * for actually moving them. */
+typedef void (*balance_iterator_function)(struct btrfs_root *,
+ struct btrfs_balance_info *,
+ struct btrfs_path *,
+ struct btrfs_key *);
+
+void balance_count_chunks(struct btrfs_root *chunk_root,
+ struct btrfs_balance_info *bal_info,
+ struct btrfs_path *path,
+ struct btrfs_key *key)
+{
+   spin_lock(&chunk_root->fs_info->balance_info_lock);
+   bal_info->expected++;
+   spin_unlock(&chunk_root->fs_info->balance_info_lock);
+}
+
+void balance_move_chunks(struct btrfs_root *chunk_root,
+struct btrfs_balance_info *bal_info,
+struct btrfs_path *path,
+struct btrfs_key *key)
+{
+   int ret;
+
+   ret = btrfs_relocate_chunk(chunk_root,
+  chunk_root->root_key.objectid,
+  key->objectid,
+  key->offset);
+   BUG_ON(ret&&  ret != -ENOSPC);
+   spin_lock(&chunk_root->fs_info->balance_info_lock);
+   bal_info->completed++;
+   spin_unlock(&chunk_root->fs_info->balance_info_lock);
+   printk(KERN_INFO "btrfs: balance: %llu/%llu block groups completed\n",
+  bal_info->completed, bal_info->expected);
+}
+
+/* Iterate through all chunks, performing some function on each one. */
+int balance_iterate_chunks(struct btrfs_root *chunk_root,
+  struct btrfs_balance_info *bal_info,
+  balance_iterator_function fn)
+{
+   int ret;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct btrfs_key found_key;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOSPC;


Return ENOMEM, we're out of memory not space.


+
+   key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+   key.offset = (u64)-1;
+   key.type = BTRFS_CHUNK_ITEM_KEY;
+
+   while (!bal_info->cancel_pending) {
+   ret = btrfs_search_slot(NULL, chunk_root,&key, path, 0, 0);
+   if (ret<  0)
+   break;
+   /*
+* this shouldn't happen, it means the last relocate
+* failed
+*/
+   if (ret == 0)
+   break;
+
+   ret = btrfs_previous_item(chunk_root, path, 0,
+ BTRFS_CHUNK_ITEM_KEY);
+   if (ret)
+   break;
+
+   btrfs_item_key_to_cpu(path->nodes[0],&found_key,
+ path->slots[0]);
+   if (found_key.objectid != key.objectid)
+   break;
+
+   /* chunk zero is special */
+   if (found_key.offset == 0)
+   break;
+
+   /* Call the function to do the work for this chunk */
+   btrfs_release_path(chunk_root, path);
+   fn(chunk_root, bal_info, path,&found_key);
+
+   key.offset = found_key.offset - 1;
+   }
+
+   btrfs_free_path(path);
+   return ret;
+}
+
  int btrfs_balance(struct btrfs_root *dev_root)
  {
int ret;
@@ -2036,11 +2127,8 @@ int btrfs_balance(struct btrfs_root *dev_root)
struct btrfs_device *device;
u64 old_size;
u64 size_to_free;
-   struct btrfs_path *path;
-   struct btrfs_key key;
struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
struct btrfs_trans_handle *trans;
-   struct btrfs_key found_key;
struct btrfs_balance_info *bal_info;

if (dev_root->fs_info->sb->s_flags&  MS_RDONLY)
@@ -2061,8 +2149,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
}
spin_lock(&dev_root->fs_info->balance_info_lock);
dev_root->fs_info->balance_info = bal_info;
-   bal_info->expected = -1; /* One less than actually counted

Re: [PATCH v4 1/8] btrfs: Balance progress monitoring

2011-04-07 Thread Josef Bacik

On 04/07/2011 01:06 PM, Hugo Mills wrote:

This patch introduces a basic form of progress monitoring for balance
operations, by counting the number of block groups remaining. The
information is exposed to userspace by an ioctl.

Signed-off-by: Hugo Mills
---
  fs/btrfs/ctree.h   |9 +++
  fs/btrfs/disk-io.c |2 +
  fs/btrfs/ioctl.c   |   34 +
  fs/btrfs/ioctl.h   |7 ++
  fs/btrfs/volumes.c |   61 ++-
  5 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7f78cc7..6c5526c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -865,6 +865,11 @@ struct btrfs_block_group_cache {
struct list_head cluster_list;
  };

+struct btrfs_balance_info {
+   u64 expected;
+   u64 completed;
+};
+
  struct reloc_control;
  struct btrfs_device;
  struct btrfs_fs_devices;
@@ -1078,6 +1083,10 @@ struct btrfs_fs_info {

/* filesystem state */
u64 fs_state;
+
+   /* Keep track of any rebalance operations on this FS */
+   spinlock_t balance_info_lock;
+   struct btrfs_balance_info *balance_info;
  };

  /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 100b07f..3d690de 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1645,6 +1645,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->ref_cache_lock);
spin_lock_init(&fs_info->fs_roots_radix_lock);
spin_lock_init(&fs_info->delayed_iput_lock);
+   spin_lock_init(&fs_info->balance_info_lock);

init_completion(&fs_info->kobj_unregister);
fs_info->tree_root = tree_root;
@@ -1670,6 +1671,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
fs_info->sb = sb;
fs_info->max_inline = 8192 * 1024;
fs_info->metadata_ratio = 0;
+   fs_info->balance_info = NULL;

fs_info->thread_pool_size = min_t(unsigned long,
  num_online_cpus() + 2, 8);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5fdb2ab..a8fbb07 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2375,6 +2375,38 @@ static noinline long btrfs_ioctl_wait_sync(struct file 
*file, void __user *argp)
return btrfs_wait_for_commit(root, transid);
  }

+/*
+ * Return the current status of any balance operation
+ */
+long btrfs_ioctl_balance_progress(
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_balance_progress __user *user_dest)
+{
+   int ret = 0;
+   struct btrfs_ioctl_balance_progress dest;
+
+   spin_lock(&fs_info->balance_info_lock);
+   if (!fs_info->balance_info) {
+   ret = -EINVAL;
+   goto error;
+   }
+
+   dest.expected = fs_info->balance_info->expected;
+   dest.completed = fs_info->balance_info->completed;
+
+   spin_unlock(&fs_info->balance_info_lock);
+
+   if (copy_to_user(user_dest,&dest,
+sizeof(struct btrfs_ioctl_balance_progress)))
+   return -EFAULT;
+
+   return 0;
+
+error:
+   spin_unlock(&fs_info->balance_info_lock);
+   return ret;
+}
+
  long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
  {
@@ -2414,6 +2446,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_rm_dev(root, argp);
case BTRFS_IOC_BALANCE:
return btrfs_balance(root->fs_info->dev_root);
+   case BTRFS_IOC_BALANCE_PROGRESS:
+   return btrfs_ioctl_balance_progress(root->fs_info, argp);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 8fb3821..4c82d40 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -157,6 +157,11 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_space_info spaces[0];
  };

+struct btrfs_ioctl_balance_progress {
+   __u64 expected;
+   __u64 completed;
+};
+
  #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
  #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -203,4 +208,6 @@ struct btrfs_ioctl_space_args {
   struct btrfs_ioctl_vol_args_v2)
  #define BTRFS_IOC_SUBVOL_GETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 25, __u64)
  #define BTRFS_IOC_SUBVOL_SETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 26, __u64)
+#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 27, \
+ struct btrfs_ioctl_balance_progress)
  #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dd13eb8..2bd4565 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2041,6 +2041,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
struct btrfs_trans_handle *trans;
stru

[PATCH v4 2/8] btrfs: Cancel filesystem balance

2011-04-07 Thread Hugo Mills
This patch adds an ioctl for cancelling a btrfs balance operation
mid-flight. The ioctl simply sets a flag, and the operation terminates
after the current block group move has completed.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/ioctl.c   |   28 
 fs/btrfs/ioctl.h   |1 +
 fs/btrfs/volumes.c |7 ++-
 4 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6c5526c..8b99807 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -868,6 +868,7 @@ struct btrfs_block_group_cache {
 struct btrfs_balance_info {
u64 expected;
u64 completed;
+   int cancel_pending;
 };
 
 struct reloc_control;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a8fbb07..aef6329 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2407,6 +2407,32 @@ error:
return ret;
 }
 
+/*
+ * Cancel a running balance operation
+ */
+long btrfs_ioctl_balance_cancel(struct btrfs_fs_info *fs_info)
+{
+   int err = 0;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   spin_lock(&fs_info->balance_info_lock);
+   if (!fs_info->balance_info) {
+   err = -EINVAL;
+   goto error;
+   }
+   if (fs_info->balance_info->cancel_pending) {
+   err = -ECANCELED;
+   goto error;
+   }
+   fs_info->balance_info->cancel_pending = 1;
+
+error:
+   spin_unlock(&fs_info->balance_info_lock);
+   return err;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2448,6 +2474,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_balance(root->fs_info->dev_root);
case BTRFS_IOC_BALANCE_PROGRESS:
return btrfs_ioctl_balance_progress(root->fs_info, argp);
+   case BTRFS_IOC_BALANCE_CANCEL:
+   return btrfs_ioctl_balance_cancel(root->fs_info);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 4c82d40..b08a699 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -210,4 +210,5 @@ struct btrfs_ioctl_balance_progress {
 #define BTRFS_IOC_SUBVOL_SETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 26, __u64)
 #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 27, \
  struct btrfs_ioctl_balance_progress)
+#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 28)
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2bd4565..5378b94 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2064,6 +2064,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
bal_info->expected = -1; /* One less than actually counted,
because chunk 0 is special */
bal_info->completed = 0;
+   bal_info->cancel_pending = 0;
spin_unlock(&dev_root->fs_info->balance_info_lock);
 
/* step one make some room on all the devices */
@@ -2129,7 +2130,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
key.offset = (u64)-1;
key.type = BTRFS_CHUNK_ITEM_KEY;
 
-   while (1) {
+   while (!bal_info->cancel_pending) {
ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
if (ret < 0)
goto error;
@@ -2169,6 +2170,10 @@ int btrfs_balance(struct btrfs_root *dev_root)
   bal_info->completed, bal_info->expected);
}
ret = 0;
+   if (bal_info->cancel_pending) {
+   printk(KERN_INFO "btrfs: balance cancelled\n");
+   ret = -EINTR;
+   }
 error:
btrfs_free_path(path);
spin_lock(&dev_root->fs_info->balance_info_lock);
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 5/8] btrfs: Balance filter for device ID

2011-04-07 Thread Hugo Mills
Balance filter to take only chunks which have (or had) a stripe on the
given device. Useful if a device has been forcibly removed from the
filesystem, and the data from that device needs rebuilding.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/ioctl.h   |8 ++--
 fs/btrfs/volumes.c |   16 +++-
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 2ce2180..29627ca 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -166,7 +166,8 @@ struct btrfs_ioctl_balance_progress {
 #define BTRFS_BALANCE_FILTER_COUNT_ONLY 0x1
 
 #define BTRFS_BALANCE_FILTER_CHUNK_TYPE 0x2
-#define BTRFS_BALANCE_FILTER_MASK 0x3 /* Logical or of all filter
+#define BTRFS_BALANCE_FILTER_DEVID 0x4
+#define BTRFS_BALANCE_FILTER_MASK 0x7 /* Logical or of all filter
   * flags -- effectively versions
   * the filtered balance ioctl */
 
@@ -183,7 +184,10 @@ struct btrfs_ioctl_balance_start {
__u64 chunk_type;  /* Flag bits required */
__u64 chunk_type_mask; /* Mask of bits to examine */
 
-   __u64 spare[506]; /* Make up the size of the structure to 4088
+   /* For FILTER_DEVID */
+   __u64 devid;
+
+   __u64 spare[505]; /* Make up the size of the structure to 4088
   * bytes for future expansion */
 };
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ea77c63..4f215e7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2036,6 +2036,7 @@ int balance_chunk_filter(struct btrfs_ioctl_balance_start 
*filter,
 {
struct extent_buffer *eb;
struct btrfs_chunk *chunk;
+   int i;
 
/* No filter defined, everything matches */
if (!filter)
@@ -2056,8 +2057,21 @@ int balance_chunk_filter(struct 
btrfs_ioctl_balance_start *filter,
return 0;
}
}
+   if (filter->flags & BTRFS_BALANCE_FILTER_DEVID) {
+   int num_stripes = btrfs_chunk_num_stripes(eb, chunk);
+   int res = 0;
+   for (i = 0; i < num_stripes; i++) {
+   struct btrfs_stripe *stripe = btrfs_stripe_nr(chunk, i);
+   if (btrfs_stripe_devid(eb, stripe) == filter->devid) {
+   res = 1;
+   break;
+   }
+   }
+   if (!res)
+   return 0;
+   }
 
-   return ret;
+   return 1;
 }
 
 /* Define a type, and two functions which can be used for the two
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/8] btrfs: Balance progress monitoring

2011-04-07 Thread Hugo Mills
This patch introduces a basic form of progress monitoring for balance
operations, by counting the number of block groups remaining. The
information is exposed to userspace by an ioctl.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/ctree.h   |9 +++
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/ioctl.c   |   34 +
 fs/btrfs/ioctl.h   |7 ++
 fs/btrfs/volumes.c |   61 ++-
 5 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7f78cc7..6c5526c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -865,6 +865,11 @@ struct btrfs_block_group_cache {
struct list_head cluster_list;
 };
 
+struct btrfs_balance_info {
+   u64 expected;
+   u64 completed;
+};
+
 struct reloc_control;
 struct btrfs_device;
 struct btrfs_fs_devices;
@@ -1078,6 +1083,10 @@ struct btrfs_fs_info {
 
/* filesystem state */
u64 fs_state;
+
+   /* Keep track of any rebalance operations on this FS */
+   spinlock_t balance_info_lock;
+   struct btrfs_balance_info *balance_info;
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 100b07f..3d690de 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1645,6 +1645,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->ref_cache_lock);
spin_lock_init(&fs_info->fs_roots_radix_lock);
spin_lock_init(&fs_info->delayed_iput_lock);
+   spin_lock_init(&fs_info->balance_info_lock);
 
init_completion(&fs_info->kobj_unregister);
fs_info->tree_root = tree_root;
@@ -1670,6 +1671,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
fs_info->sb = sb;
fs_info->max_inline = 8192 * 1024;
fs_info->metadata_ratio = 0;
+   fs_info->balance_info = NULL;
 
fs_info->thread_pool_size = min_t(unsigned long,
  num_online_cpus() + 2, 8);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5fdb2ab..a8fbb07 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2375,6 +2375,38 @@ static noinline long btrfs_ioctl_wait_sync(struct file 
*file, void __user *argp)
return btrfs_wait_for_commit(root, transid);
 }
 
+/*
+ * Return the current status of any balance operation
+ */
+long btrfs_ioctl_balance_progress(
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_balance_progress __user *user_dest)
+{
+   int ret = 0;
+   struct btrfs_ioctl_balance_progress dest;
+
+   spin_lock(&fs_info->balance_info_lock);
+   if (!fs_info->balance_info) {
+   ret = -EINVAL;
+   goto error;
+   }
+
+   dest.expected = fs_info->balance_info->expected;
+   dest.completed = fs_info->balance_info->completed;
+
+   spin_unlock(&fs_info->balance_info_lock);
+
+   if (copy_to_user(user_dest, &dest,
+sizeof(struct btrfs_ioctl_balance_progress)))
+   return -EFAULT;
+
+   return 0;
+
+error:
+   spin_unlock(&fs_info->balance_info_lock);
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2414,6 +2446,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_rm_dev(root, argp);
case BTRFS_IOC_BALANCE:
return btrfs_balance(root->fs_info->dev_root);
+   case BTRFS_IOC_BALANCE_PROGRESS:
+   return btrfs_ioctl_balance_progress(root->fs_info, argp);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 8fb3821..4c82d40 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -157,6 +157,11 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_space_info spaces[0];
 };
 
+struct btrfs_ioctl_balance_progress {
+   __u64 expected;
+   __u64 completed;
+};
+
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
 #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -203,4 +208,6 @@ struct btrfs_ioctl_space_args {
   struct btrfs_ioctl_vol_args_v2)
 #define BTRFS_IOC_SUBVOL_GETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 25, __u64)
 #define BTRFS_IOC_SUBVOL_SETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 26, __u64)
+#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 27, \
+ struct btrfs_ioctl_balance_progress)
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dd13eb8..2bd4565 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2041,6 +2041,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
struct btrfs_trans_handle *trans;
struct btrfs_key found_key;
+   struct btrfs_balance_inf

[PATCH v4 7/8] btrfs: Replication-type information

2011-04-07 Thread Hugo Mills
There are a few places in btrfs where knowledge of the various
parameters of a replication type is needed. Factor this out into a
single function which can supply all the relevant information.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/super.c   |   16 +++-
 fs/btrfs/volumes.c |   96 ++-
 fs/btrfs/volumes.h |   17 +
 3 files changed, 87 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d39a989..4341730 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -879,12 +879,12 @@ static int btrfs_calc_avail_data_space(struct btrfs_root 
*root, u64 *free_bytes)
struct btrfs_device_info *devices_info;
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_device *device;
+   struct btrfs_replication_info repl_info;
u64 skip_space;
u64 type;
u64 avail_space;
u64 used_space;
u64 min_stripe_size;
-   int min_stripes = 1;
int i = 0, nr_devices;
int ret;
 
@@ -898,12 +898,7 @@ static int btrfs_calc_avail_data_space(struct btrfs_root 
*root, u64 *free_bytes)
 
/* calc min stripe number for data space alloction */
type = btrfs_get_alloc_profile(root, 1);
-   if (type & BTRFS_BLOCK_GROUP_RAID0)
-   min_stripes = 2;
-   else if (type & BTRFS_BLOCK_GROUP_RAID1)
-   min_stripes = 2;
-   else if (type & BTRFS_BLOCK_GROUP_RAID10)
-   min_stripes = 4;
+   btrfs_get_replication_info(&repl_info, type);
 
if (type & BTRFS_BLOCK_GROUP_DUP)
min_stripe_size = 2 * BTRFS_STRIPE_LEN;
@@ -971,14 +966,15 @@ static int btrfs_calc_avail_data_space(struct btrfs_root 
*root, u64 *free_bytes)
 
i = nr_devices - 1;
avail_space = 0;
-   while (nr_devices >= min_stripes) {
+   while (nr_devices >= repl_info.devs_min) {
if (devices_info[i].max_avail >= min_stripe_size) {
int j;
u64 alloc_size;
 
-   avail_space += devices_info[i].max_avail * min_stripes;
+   avail_space += devices_info[i].max_avail
+ * repl_info.devs_min;
alloc_size = devices_info[i].max_avail;
-   for (j = i + 1 - min_stripes; j <= i; j++)
+   for (j = i + 1 - repl_info.devs_min; j <= i; j++)
devices_info[j].max_avail -= alloc_size;
}
i--;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4c1b5a6..83f13b6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -141,6 +141,51 @@ static void requeue_list(struct btrfs_pending_bios 
*pending_bios,
pending_bios->tail = tail;
 }
 
+void btrfs_get_replication_info(struct btrfs_replication_info *info,
+   u64 type)
+{
+   info->sub_stripes = 1;
+   info->dev_stripes = 1;
+   info->devs_increment = 1;
+   info->num_copies = 1;
+   info->devs_max = 0; /* 0 == as many as possible */
+   info->devs_min = 1;
+
+   if (type & (BTRFS_BLOCK_GROUP_DUP)) {
+   info->dev_stripes = 2;
+   info->num_copies = 2;
+   info->devs_max = 1;
+   } else if (type & (BTRFS_BLOCK_GROUP_RAID0)) {
+   info->devs_min = 2;
+   } else if (type & (BTRFS_BLOCK_GROUP_RAID1)) {
+   info->devs_increment = 2;
+   info->num_copies = 2;
+   info->devs_max = 2;
+   info->devs_min = 2;
+   } else if (type & (BTRFS_BLOCK_GROUP_RAID10)) {
+   info->sub_stripes = 2;
+   info->devs_increment = 2;
+   info->num_copies = 2;
+   info->devs_min = 4;
+   }
+
+   if (type & BTRFS_BLOCK_GROUP_DATA) {
+   info->max_stripe_size = 1024 * 1024 * 1024;
+   info->min_stripe_size = 64 * 1024 * 1024;
+   info->max_chunk_size = 10 * info->max_stripe_size;
+   } else if (type & BTRFS_BLOCK_GROUP_METADATA) {
+   info->max_stripe_size = 256 * 1024 * 1024;
+   info->min_stripe_size = 32 * 1024 * 1024;
+   info->max_chunk_size = info->max_stripe_size;
+   } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
+   info->max_stripe_size = 8 * 1024 * 1024;
+   info->min_stripe_size = 1 * 1024 * 1024;
+   info->max_chunk_size = 2 * info->max_stripe_size;
+   } else {
+   BUG_ON(1);
+   }
+}
+
 /*
  * we try to collect pending bios for a device so we don't get a large
  * number of procs sending bios down to the same device.  This greatly
@@ -1248,6 +1293,7 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path)
struct block_device *bdev;
struct buffer_head *bh = NULL;
struct btrfs_super_block *disk_su

[PATCH v4 4/8] btrfs: Implement filtered balance ioctl

2011-04-07 Thread Hugo Mills
The filtered balance ioctl provides a facility to perform a balance
operation on a subset of the chunks in the filesystem. This patch
implements the base ioctl for this operation, and one filter type.
The filter in this patch selects chunks on the basis of their chunk
flags field, and can select any combination of bits set or unset.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/ioctl.c   |   40 +++-
 fs/btrfs/ioctl.h   |   27 +
 fs/btrfs/volumes.c |   65 +--
 fs/btrfs/volumes.h |4 ++-
 4 files changed, 126 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index aef6329..da3a2d3 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2433,6 +2433,42 @@ error:
return err;
 }
 
+long btrfs_ioctl_balance(struct btrfs_root *dev_root,
+struct btrfs_ioctl_balance_start __user *user_filters)
+{
+   int ret = 0;
+   struct btrfs_ioctl_balance_start *dest;
+
+   dest = kmalloc(sizeof(struct btrfs_ioctl_balance_start), GFP_KERNEL);
+   if (!dest)
+   return -ENOMEM;
+
+   if (copy_from_user(dest, user_filters,
+  sizeof(struct btrfs_ioctl_balance_start))) {
+   ret = -EFAULT;
+   goto error;
+   }
+
+   /* Basic sanity checking: has the user requested anything outside
+* the range we know about? */
+   if (dest->flags & ~BTRFS_BALANCE_FILTER_MASK) {
+   ret = -ENOTSUPP;
+   goto error;
+   }
+
+   /* Do the balance */
+   ret = btrfs_balance(dev_root, dest);
+
+   if (copy_to_user(user_filters, dest,
+sizeof(struct btrfs_ioctl_balance_start))) {
+   ret = -EFAULT;
+   }
+
+error:
+   kfree(dest);
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2471,11 +2507,13 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_RM_DEV:
return btrfs_ioctl_rm_dev(root, argp);
case BTRFS_IOC_BALANCE:
-   return btrfs_balance(root->fs_info->dev_root);
+   return btrfs_ioctl_balance(root->fs_info->dev_root, NULL);
case BTRFS_IOC_BALANCE_PROGRESS:
return btrfs_ioctl_balance_progress(root->fs_info, argp);
case BTRFS_IOC_BALANCE_CANCEL:
return btrfs_ioctl_balance_cancel(root->fs_info);
+   case BTRFS_IOC_BALANCE_FILTERED:
+   return btrfs_ioctl_balance(root->fs_info->dev_root, argp);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index b08a699..2ce2180 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -162,6 +162,31 @@ struct btrfs_ioctl_balance_progress {
__u64 completed;
 };
 
+/* Types of balance filter */
+#define BTRFS_BALANCE_FILTER_COUNT_ONLY 0x1
+
+#define BTRFS_BALANCE_FILTER_CHUNK_TYPE 0x2
+#define BTRFS_BALANCE_FILTER_MASK 0x3 /* Logical or of all filter
+  * flags -- effectively versions
+  * the filtered balance ioctl */
+
+/* All the possible options for a filter */
+struct btrfs_ioctl_balance_start {
+   __u64 flags; /* Bit field indicating which fields of this struct
+   are filled */
+
+   /* Output values: chunk counts */
+   __u64 examined;
+   __u64 balanced;
+
+   /* For FILTER_CHUNK_TYPE */
+   __u64 chunk_type;  /* Flag bits required */
+   __u64 chunk_type_mask; /* Mask of bits to examine */
+
+   __u64 spare[506]; /* Make up the size of the structure to 4088
+  * bytes for future expansion */
+};
+
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
 #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -211,4 +236,6 @@ struct btrfs_ioctl_balance_progress {
 #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 27, \
  struct btrfs_ioctl_balance_progress)
 #define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 28)
+#define BTRFS_IOC_BALANCE_FILTERED _IOWR(BTRFS_IOCTL_MAGIC, 29, \
+   struct btrfs_ioctl_balance_start)
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ffba817..ea77c63 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2029,6 +2029,37 @@ static u64 div_factor(u64 num, int factor)
return num;
 }
 
+int balance_chunk_filter(struct btrfs_ioctl_balance_start *filter,
+struct btrfs_root *chunk_root,
+struct btrfs_path *path,
+struct btrfs_key *key)
+{
+   st

[PATCH v4 3/8] btrfs: Factor out enumeration of chunks to a separate function

2011-04-07 Thread Hugo Mills
The main balance function has two loops which are functionally
identical in their looping mechanism, but which perform a different
operation on the chunks they loop over. To avoid repeating code more
than necessary, factor this loop out into a separate iterator function
which takes a function parameter for the action to be performed.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/volumes.c |  179 +---
 1 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5378b94..ffba817 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2029,6 +2029,97 @@ static u64 div_factor(u64 num, int factor)
return num;
 }
 
+/* Define a type, and two functions which can be used for the two
+ * phases of the balance operation: one for counting chunks, and one
+ * for actually moving them. */
+typedef void (*balance_iterator_function)(struct btrfs_root *,
+ struct btrfs_balance_info *,
+ struct btrfs_path *,
+ struct btrfs_key *);
+
+void balance_count_chunks(struct btrfs_root *chunk_root,
+ struct btrfs_balance_info *bal_info,
+ struct btrfs_path *path,
+ struct btrfs_key *key)
+{
+   spin_lock(&chunk_root->fs_info->balance_info_lock);
+   bal_info->expected++;
+   spin_unlock(&chunk_root->fs_info->balance_info_lock);
+}
+
+void balance_move_chunks(struct btrfs_root *chunk_root,
+struct btrfs_balance_info *bal_info,
+struct btrfs_path *path,
+struct btrfs_key *key)
+{
+   int ret;
+
+   ret = btrfs_relocate_chunk(chunk_root,
+  chunk_root->root_key.objectid,
+  key->objectid,
+  key->offset);
+   BUG_ON(ret && ret != -ENOSPC);
+   spin_lock(&chunk_root->fs_info->balance_info_lock);
+   bal_info->completed++;
+   spin_unlock(&chunk_root->fs_info->balance_info_lock);
+   printk(KERN_INFO "btrfs: balance: %llu/%llu block groups completed\n",
+  bal_info->completed, bal_info->expected);
+}
+
+/* Iterate through all chunks, performing some function on each one. */
+int balance_iterate_chunks(struct btrfs_root *chunk_root,
+  struct btrfs_balance_info *bal_info,
+  balance_iterator_function fn)
+{
+   int ret;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct btrfs_key found_key;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOSPC;
+
+   key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+   key.offset = (u64)-1;
+   key.type = BTRFS_CHUNK_ITEM_KEY;
+
+   while (!bal_info->cancel_pending) {
+   ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
+   if (ret < 0)
+   break;
+   /*
+* this shouldn't happen, it means the last relocate
+* failed
+*/
+   if (ret == 0)
+   break;
+
+   ret = btrfs_previous_item(chunk_root, path, 0,
+ BTRFS_CHUNK_ITEM_KEY);
+   if (ret)
+   break;
+
+   btrfs_item_key_to_cpu(path->nodes[0], &found_key,
+ path->slots[0]);
+   if (found_key.objectid != key.objectid)
+   break;
+
+   /* chunk zero is special */
+   if (found_key.offset == 0)
+   break;
+
+   /* Call the function to do the work for this chunk */
+   btrfs_release_path(chunk_root, path);
+   fn(chunk_root, bal_info, path, &found_key);
+
+   key.offset = found_key.offset - 1;
+   }
+
+   btrfs_free_path(path);
+   return ret;
+}
+
 int btrfs_balance(struct btrfs_root *dev_root)
 {
int ret;
@@ -2036,11 +2127,8 @@ int btrfs_balance(struct btrfs_root *dev_root)
struct btrfs_device *device;
u64 old_size;
u64 size_to_free;
-   struct btrfs_path *path;
-   struct btrfs_key key;
struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root;
struct btrfs_trans_handle *trans;
-   struct btrfs_key found_key;
struct btrfs_balance_info *bal_info;
 
if (dev_root->fs_info->sb->s_flags & MS_RDONLY)
@@ -2061,8 +2149,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
}
spin_lock(&dev_root->fs_info->balance_info_lock);
dev_root->fs_info->balance_info = bal_info;
-   bal_info->expected = -1; /* One less than actually counted,
-   because chunk 0 is special */
+   bal_info->expecte

[PATCH v4 8/8] btrfs: Balance filter for physical device address

2011-04-07 Thread Hugo Mills
Add a filter for balancing which allows the selection of chunks with
data in the given byte range on any block device in the filesystem. On
its own, this filter is of little use, but when used with the devid
filter, it can be used to rebalance all chunks which lie on a part of
a specific device.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/ioctl.h   |9 +++--
 fs/btrfs/volumes.c |   19 +++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 5177229..b13f14d 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -168,7 +168,8 @@ struct btrfs_ioctl_balance_progress {
 #define BTRFS_BALANCE_FILTER_CHUNK_TYPE 0x2
 #define BTRFS_BALANCE_FILTER_DEVID 0x4
 #define BTRFS_BALANCE_FILTER_VIRTUAL_ADDRESS_RANGE 0x8
-#define BTRFS_BALANCE_FILTER_MASK 0xf /* Logical or of all filter
+#define BTRFS_BALANCE_FILTER_DEVICE_ADDRESS_RANGE 0x10
+#define BTRFS_BALANCE_FILTER_MASK 0x1f /* Logical or of all filter
   * flags -- effectively versions
   * the filtered balance ioctl */
 
@@ -192,7 +193,11 @@ struct btrfs_ioctl_balance_start {
__u64 vrange_start;
__u64 vrange_end;
 
-   __u64 spare[503]; /* Make up the size of the structure to 4088
+   /* For FILTER_DEVICE_ADDRESS_RANGE */
+   __u64 drange_start;
+   __u64 drange_end;
+
+   __u64 spare[501]; /* Make up the size of the structure to 4088
   * bytes for future expansion */
 };
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 83f13b6..f97f19f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2123,6 +2123,25 @@ int balance_chunk_filter(struct 
btrfs_ioctl_balance_start *filter,
if (filter->vrange_start >= end || start >= filter->vrange_end)
return 0;
}
+   if (filter->flags & BTRFS_BALANCE_FILTER_DEVICE_ADDRESS_RANGE) {
+   int num_stripes = btrfs_chunk_num_stripes(eb, chunk);
+   int stripe_length = btrfs_chunk_length(eb, chunk)
+   * num_stripes / replinfo.num_copies;
+   int res = 0;
+
+   for (i = 0; i < num_stripes; i++) {
+   struct btrfs_stripe *stripe = btrfs_stripe_nr(chunk, i);
+   u64 start = btrfs_stripe_offset(eb, stripe);
+   u64 end = start + stripe_length;
+   if (filter->drange_start < end
+   && start < filter->drange_end) {
+   res = 1;
+   break;
+   }
+   }
+   if (!res)
+   return 0;
+   }
 
return 1;
 }
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 6/8] btrfs: Balance filter for virtual address ranges

2011-04-07 Thread Hugo Mills
Allow the balancing of chunks where some part of the chunk lies within
the virtual (i.e. btrfs-internal) address range passed.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/ioctl.h   |9 +++--
 fs/btrfs/volumes.c |6 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 29627ca..5177229 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -167,7 +167,8 @@ struct btrfs_ioctl_balance_progress {
 
 #define BTRFS_BALANCE_FILTER_CHUNK_TYPE 0x2
 #define BTRFS_BALANCE_FILTER_DEVID 0x4
-#define BTRFS_BALANCE_FILTER_MASK 0x7 /* Logical or of all filter
+#define BTRFS_BALANCE_FILTER_VIRTUAL_ADDRESS_RANGE 0x8
+#define BTRFS_BALANCE_FILTER_MASK 0xf /* Logical or of all filter
   * flags -- effectively versions
   * the filtered balance ioctl */
 
@@ -187,7 +188,11 @@ struct btrfs_ioctl_balance_start {
/* For FILTER_DEVID */
__u64 devid;
 
-   __u64 spare[505]; /* Make up the size of the structure to 4088
+   /* For FILTER_VIRTUAL_ADDRESS_RANGE */
+   __u64 vrange_start;
+   __u64 vrange_end;
+
+   __u64 spare[503]; /* Make up the size of the structure to 4088
   * bytes for future expansion */
 };
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4f215e7..4c1b5a6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2070,6 +2070,12 @@ int balance_chunk_filter(struct 
btrfs_ioctl_balance_start *filter,
if (!res)
return 0;
}
+   if (filter->flags & BTRFS_BALANCE_FILTER_VIRTUAL_ADDRESS_RANGE) {
+   u64 start = key->offset;
+   u64 end = start + btrfs_chunk_length(eb, chunk);
+   if (filter->vrange_start >= end || start >= filter->vrange_end)
+   return 0;
+   }
 
return 1;
 }
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/8] Balance management

2011-04-07 Thread Hugo Mills
[Let's try that again in separate emails, shall we? I'm an idiot.]

   Hi, Chris,

   This is a rebased version of my original balance management patches
to the latest kernel. I also include a series of patches which
introduce filtered or partial balances. With these patches, it is
possible to rebalance chunks on the basis of:

 * their chunk flags
 * residency on any device
 * physical block device address
 * logical btrfs-internal address

with a clean infrastructure for implementing further balance filters,
and a forward-compatible ioctl for starting filters.

   Hugo.

---

Hugo Mills (8):
  btrfs: Balance progress monitoring
  btrfs: Cancel filesystem balance
  btrfs: Factor out enumeration of chunks to a separate function
  btrfs: Implement filtered balance ioctl
  btrfs: Balance filter for device ID
  btrfs: Balance filter for virtual address ranges
  btrfs: Replication-type information
  btrfs: Balance filter for physical device address

 fs/btrfs/ctree.h   |   10 ++
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/ioctl.c   |  102 +++-
 fs/btrfs/ioctl.h   |   49 +++
 fs/btrfs/super.c   |   16 +--
 fs/btrfs/volumes.c |  353 ---
 fs/btrfs/volumes.h |   21 +++-
 7 files changed, 465 insertions(+), 88 deletions(-)

-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/8] Balance management

2011-04-07 Thread Hugo Mills
   Hi, Chris,

   This is a rebased version of my original balance management patches
to the latest kernel. I also include a series of patches which
introduce filtered or partial balances. With these patches, it is
possible to rebalance chunks on the basis of:

 * their chunk flags
 * residency on any device
 * physical block device address
 * logical btrfs-internal address

with a clean infrastructure for implementing further balance filters,
and a forward-compatible ioctl for starting filters.

   Hugo.

---

Hugo Mills (8):
  btrfs: Balance progress monitoring
  btrfs: Cancel filesystem balance
  btrfs: Factor out enumeration of chunks to a separate function
  btrfs: Implement filtered balance ioctl
  btrfs: Balance filter for device ID
  btrfs: Balance filter for virtual address ranges
  btrfs: Replication-type information
  btrfs: Balance filter for physical device address

 fs/btrfs/ctree.h   |   10 ++
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/ioctl.c   |  102 +++-
 fs/btrfs/ioctl.h   |   49 +++
 fs/btrfs/super.c   |   16 +--
 fs/btrfs/volumes.c |  353 ---
 fs/btrfs/volumes.h |   21 +++-
 7 files changed, 465 insertions(+), 88 deletions(-)

-- 
1.7.2.5

>From 60844e06b991d6b629a856a28d9c528a0aed4b89 Mon Sep 17 00:00:00 2001
From: Hugo Mills 
Date: Thu, 7 Apr 2011 17:38:43 +0100
Subject: [PATCH v3 1/8] btrfs: Balance progress monitoring

This patch introduces a basic form of progress monitoring for balance
operations, by counting the number of block groups remaining. The
information is exposed to userspace by an ioctl.

Signed-off-by: Hugo Mills 
---
 fs/btrfs/ctree.h   |9 +++
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/ioctl.c   |   34 +
 fs/btrfs/ioctl.h   |7 ++
 fs/btrfs/volumes.c |   61 ++-
 5 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7f78cc7..6c5526c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -865,6 +865,11 @@ struct btrfs_block_group_cache {
struct list_head cluster_list;
 };
 
+struct btrfs_balance_info {
+   u64 expected;
+   u64 completed;
+};
+
 struct reloc_control;
 struct btrfs_device;
 struct btrfs_fs_devices;
@@ -1078,6 +1083,10 @@ struct btrfs_fs_info {
 
/* filesystem state */
u64 fs_state;
+
+   /* Keep track of any rebalance operations on this FS */
+   spinlock_t balance_info_lock;
+   struct btrfs_balance_info *balance_info;
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 100b07f..3d690de 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1645,6 +1645,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->ref_cache_lock);
spin_lock_init(&fs_info->fs_roots_radix_lock);
spin_lock_init(&fs_info->delayed_iput_lock);
+   spin_lock_init(&fs_info->balance_info_lock);
 
init_completion(&fs_info->kobj_unregister);
fs_info->tree_root = tree_root;
@@ -1670,6 +1671,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
fs_info->sb = sb;
fs_info->max_inline = 8192 * 1024;
fs_info->metadata_ratio = 0;
+   fs_info->balance_info = NULL;
 
fs_info->thread_pool_size = min_t(unsigned long,
  num_online_cpus() + 2, 8);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5fdb2ab..a8fbb07 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2375,6 +2375,38 @@ static noinline long btrfs_ioctl_wait_sync(struct file 
*file, void __user *argp)
return btrfs_wait_for_commit(root, transid);
 }
 
+/*
+ * Return the current status of any balance operation
+ */
+long btrfs_ioctl_balance_progress(
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_balance_progress __user *user_dest)
+{
+   int ret = 0;
+   struct btrfs_ioctl_balance_progress dest;
+
+   spin_lock(&fs_info->balance_info_lock);
+   if (!fs_info->balance_info) {
+   ret = -EINVAL;
+   goto error;
+   }
+
+   dest.expected = fs_info->balance_info->expected;
+   dest.completed = fs_info->balance_info->completed;
+
+   spin_unlock(&fs_info->balance_info_lock);
+
+   if (copy_to_user(user_dest, &dest,
+sizeof(struct btrfs_ioctl_balance_progress)))
+   return -EFAULT;
+
+   return 0;
+
+error:
+   spin_unlock(&fs_info->balance_info_lock);
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2414,6 +2446,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_rm_dev(root, argp);
case BTRFS_IOC_BALANCE:
return btrfs_balance(root->fs_info->dev_root);
+   case BTRFS_IOC_BALANCE_PROGRESS:
+   return btrfs_ioctl_balance_progress(root->fs_info, ar

Re: 2.6.29-rc2 oops and assertion failure...

2011-04-07 Thread Josef Bacik

On 04/07/2011 03:21 AM, Daniel J Blueman wrote:

When running a practical stress-test on 2.6.29-rc2 trying to reproduce
an older (extent refcounting) issue, I am consistently able to hit an
oops [1] and an assertion failure [2].



Sorry about that, please apply the patch I just sent this morning

[PATCH] Btrfs: deal with the case that we run out of space in the cache

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.39-rc1: btrfs "WARNING: at fs/btrfs/inode.c:2177"

2011-04-07 Thread Josef Bacik

On 04/07/2011 05:41 AM, Jeff Wu wrote:

Hi ,
I run iozone stress test on a ceph client for x86_64, ceph 0.26 +
linux-2.6.39-rc1 server,
printk "WARNING: at fs/btrfs/inode.c:2177"

Crap I was hoping I had fixed this, could you run with this debug patch 
and get me the output so I can figure out what's going on?  Thanks,


Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..79ec933 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3696,6 +3696,7 @@ int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
 {
int ret;
 
+   WARN_ON(block_rsv == root->orphan_block_rsv);
if (num_bytes == 0)
return 0;
 


[PATCH] Btrfs: deal with the case that we run out of space in the cache

2011-04-07 Thread Josef Bacik
Currently we don't handle running out of space in the cache, so to fix this we
keep track of how far in the cache we are.  Then we only dirty the pages if we
successfully modify all of them, otherwise if we have an error or run out of
space we can just drop them and not worry about the vm writing them out.
Thanks,

Tested-by Johannes Hirte 
Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h|5 ++
 fs/btrfs/file.c |   21 +++
 fs/btrfs/free-space-cache.c |  117 ---
 3 files changed, 69 insertions(+), 74 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3458b57..0d00a07 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2576,6 +2576,11 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, 
struct inode *inode,
 int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
  struct inode *inode, u64 start, u64 end);
 int btrfs_release_file(struct inode *inode, struct file *file);
+void btrfs_drop_pages(struct page **pages, size_t num_pages);
+int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
+ struct page **pages, size_t num_pages,
+ loff_t pos, size_t write_bytes,
+ struct extent_state **cached);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e621ea5..75899a0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -104,7 +104,7 @@ static noinline int btrfs_copy_from_user(loff_t pos, int 
num_pages,
 /*
  * unlocks pages after btrfs_file_write is done with them
  */
-static noinline void btrfs_drop_pages(struct page **pages, size_t num_pages)
+void btrfs_drop_pages(struct page **pages, size_t num_pages)
 {
size_t i;
for (i = 0; i < num_pages; i++) {
@@ -127,16 +127,13 @@ static noinline void btrfs_drop_pages(struct page 
**pages, size_t num_pages)
  * this also makes the decision about creating an inline extent vs
  * doing real data extents, marking pages dirty and delalloc as required.
  */
-static noinline int dirty_and_release_pages(struct btrfs_root *root,
-   struct file *file,
-   struct page **pages,
-   size_t num_pages,
-   loff_t pos,
-   size_t write_bytes)
+int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
+ struct page **pages, size_t num_pages,
+ loff_t pos, size_t write_bytes,
+ struct extent_state **cached)
 {
int err = 0;
int i;
-   struct inode *inode = fdentry(file)->d_inode;
u64 num_bytes;
u64 start_pos;
u64 end_of_last_block;
@@ -149,7 +146,7 @@ static noinline int dirty_and_release_pages(struct 
btrfs_root *root,
 
end_of_last_block = start_pos + num_bytes - 1;
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
-   NULL);
+   cached);
if (err)
return err;
 
@@ -992,9 +989,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file 
*file,
}
 
if (copied > 0) {
-   ret = dirty_and_release_pages(root, file, pages,
- dirty_pages, pos,
- copied);
+   ret = btrfs_dirty_pages(root, inode, pages,
+   dirty_pages, pos, copied,
+   NULL);
if (ret) {
btrfs_delalloc_release_space(inode,
dirty_pages << PAGE_CACHE_SHIFT);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index f561c95..a3f420d 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -508,6 +508,7 @@ int btrfs_write_out_cache(struct btrfs_root *root,
struct inode *inode;
struct rb_node *node;
struct list_head *pos, *n;
+   struct page **pages;
struct page *page;
struct extent_state *cached_state = NULL;
struct btrfs_free_cluster *cluster = NULL;
@@ -517,13 +518,13 @@ int btrfs_write_out_cache(struct btrfs_root *root,
u64 start, end, len;
u64 bytes = 0;
u32 *crc, *checksums;
-   pgoff_t index = 0, last_index = 0;
unsigned long first_page_offset;
-   int num_checksums;
+   int index = 0, num_pages = 0;
int entries = 0;
int bitmaps = 0;
int ret = 0;
bool next_page = false;
+   bool out_of_space = false;
 
root = root->fs_info->tree_root;
 
@@ -551,24 +55

Re: BUG: unable to handle kernel NULL pointer dereference at (null)

2011-04-07 Thread Jordan Patterson
On Thu, Apr 7, 2011 at 9:44 AM, Jordan Patterson  wrote:
> On Thu, Apr 7, 2011 at 7:17 AM, Josef Bacik  wrote:
>> On Wed, Apr 06, 2011 at 02:47:28PM -0600, Jordan Patterson wrote:
>>> Hi Josef:
>>>
>>> I tried your latest patch, since I had the same issue from the first
>>> email.  With the patch applied, I am now hitting the
>>> BUG_ON(block_group->total_bitmaps >= max_bitmaps); in add_new_bitmap
>>> in
>>> fs/btrfs/free-space-cache.c:1246 as soon as I mount the filesystem,
>>> with or without -o clear_cache.
>>>
>>> It works fine in 2.6.38.  I get the same error after mounting with
>>> clear_cache under 2.6.38 and rebooting into the current kernel with
>>> your patch.
>>>
>>
>> Do you have a backtrace so I can see how we're getting here?  This is a 
>> seperate
>> issue from the one this patch tries to solve, but now that it seems that's 
>> fixed
>> I will work on this now :).  Thanks,
>>
>> Josef
>>
>
> I wasn't able to test until now, but Johannes' suggestion may have
> fixed the issue for me.  I added clear_cache to my rootflags in grub,
> and it is now mounted fine with the current btrfs code with your last
> patch.  I don't have the backtrace, but I'll send it to you it if I
> see it happen again.
>
> Thanks.
>
> Jordan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: unable to handle kernel NULL pointer dereference at (null)

2011-04-07 Thread Josef Bacik
On Wed, Apr 06, 2011 at 02:47:28PM -0600, Jordan Patterson wrote:
> Hi Josef:
> 
> I tried your latest patch, since I had the same issue from the first
> email.  With the patch applied, I am now hitting the
> BUG_ON(block_group->total_bitmaps >= max_bitmaps); in add_new_bitmap
> in
> fs/btrfs/free-space-cache.c:1246 as soon as I mount the filesystem,
> with or without -o clear_cache.
> 
> It works fine in 2.6.38.  I get the same error after mounting with
> clear_cache under 2.6.38 and rebooting into the current kernel with
> your patch.
> 

Do you have a backtrace so I can see how we're getting here?  This is a seperate
issue from the one this patch tries to solve, but now that it seems that's fixed
I will work on this now :).  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: cast u64 to long long to avoid printf warnings

2011-04-07 Thread Anton Blanchard

When building on ppc64 I hit a number of warnings in printf:

btrfs-map-logical.c:69: error: format ‘%Lu’ expects type ‘long long
unsigned int’, but argument 4 has type ‘u64’

Fix them.

Signed-off-by: Anton Blanchard 
---

diff --git a/btrfs-list.c b/btrfs-list.c
index 93766a8..c602b87 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -249,7 +249,8 @@ static int resolve_root(struct root_lookup *rl, struct 
root_info *ri)
break;
}
}
-   printf("ID %llu top level %llu path %s\n", ri->root_id, top_id,
+   printf("ID %llu top level %llu path %s\n",
+  (unsigned long long)ri->root_id, (unsigned long long)top_id,
   full_path);
free(full_path);
return 0;
diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
index a109c6a..9e9806d 100644
--- a/btrfs-map-logical.c
+++ b/btrfs-map-logical.c
@@ -65,8 +65,8 @@ struct extent_buffer *debug_read_block(struct btrfs_root 
*root, u64 bytenr,
eb->dev_bytenr = multi->stripes[0].physical;
 
fprintf(info_file, "mirror %d logical %Lu physical %Lu "
-   "device %s\n", mirror_num, bytenr, eb->dev_bytenr,
-   device->name);
+   "device %s\n", mirror_num, (unsigned long long)bytenr,
+   (unsigned long long)eb->dev_bytenr, device->name);
kfree(multi);
 
if (!copy || mirror_num == copy)
diff --git a/btrfsctl.c b/btrfsctl.c
index 92bdf39..896999f 100644
--- a/btrfsctl.c
+++ b/btrfsctl.c
@@ -245,7 +245,7 @@ int main(int ac, char **av)
args.fd = fd;
ret = ioctl(snap_fd, command, &args);
} else if (command == BTRFS_IOC_DEFAULT_SUBVOL) {
-   printf("objectid is %llu\n", objectid);
+   printf("objectid is %llu\n", (unsigned long long)objectid);
ret = ioctl(fd, command, &objectid);
} else
ret = ioctl(fd, command, &args);
diff --git a/debug-tree.c b/debug-tree.c
index 0525354..e8ee64e 100644
--- a/debug-tree.c
+++ b/debug-tree.c
@@ -166,7 +166,8 @@ int main(int ac, char **av)
  root->nodesize, 0);
}
if (!leaf) {
-   fprintf(stderr, "failed to read %llu\n", block_only);
+   fprintf(stderr, "failed to read %llu\n",
+   (unsigned long long)block_only);
return 0;
}
btrfs_print_tree(root, leaf, 0);
diff --git a/disk-io.c b/disk-io.c
index a6e1000..5295dca 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -678,7 +678,8 @@ struct btrfs_root *open_ctree_fd(int fp, const char *path, 
u64 sb_bytenr,
   ~BTRFS_FEATURE_INCOMPAT_SUPP;
if (features) {
printk("couldn't open because of unsupported "
-  "option features (%Lx).\n", features);
+  "option features (%Lx).\n",
+  (unsigned long long)features);
BUG_ON(1);
}
 
@@ -692,7 +693,8 @@ struct btrfs_root *open_ctree_fd(int fp, const char *path, 
u64 sb_bytenr,
~BTRFS_FEATURE_COMPAT_RO_SUPP;
if (writes && features) {
printk("couldn't open RDWR because of unsupported "
-  "option features (%Lx).\n", features);
+  "option features (%Lx).\n",
+  (unsigned long long)features);
BUG_ON(1);
}
 
diff --git a/extent-tree.c b/extent-tree.c
index b2f9bb2..3a09f2f 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1448,7 +1448,8 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
*trans,
goto out;
if (ret != 0) {
btrfs_print_leaf(root, path->nodes[0]);
-   printk("failed to find block number %Lu\n", bytenr);
+   printk("failed to find block number %Lu\n",
+  (unsigned long long)bytenr);
BUG();
}
 
diff --git a/print-tree.c b/print-tree.c
index ac575d5..c673dcb 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -497,7 +497,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct 
extent_buffer *l)
case BTRFS_DIR_LOG_ITEM_KEY:
dlog = btrfs_item_ptr(l, i, struct btrfs_dir_log_item);
printf("\t\tdir log end %Lu\n",
-  btrfs_dir_log_end(l, dlog));
+  (unsigned long long)btrfs_dir_log_end(l, dlog));
   break;
case BTRFS_ORPHAN_ITEM_KEY:
printf("\t\torphan item\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.39-rc1: btrfs "WARNING: at fs/btrfs/inode.c:2177"

2011-04-07 Thread Wido den Hollander
Hi,

On Thu, 2011-04-07 at 17:41 +0800, Jeff Wu wrote:
> Hi , 
> I run iozone stress test on a ceph client for x86_64, ceph 0.26 +
> linux-2.6.39-rc1 server,
> printk "WARNING: at fs/btrfs/inode.c:2177"
> 
> 1.log1 :

This is a known issue, see: http://tracker.newdream.net/issues/563

It has been passed upstream to the btrfs developers and they are now
working on it.

> 
> ...
> [ 1663.370008] CE: hpet2 increased min_delta_ns to 7500 nsec
> [ 1663.375399] CE: hpet2 increased min_delta_ns to 11250 nsec
> [ 4945.270388] [ cut here ]
> [ 4945.275011] WARNING: at fs/btrfs/inode.c:2177
> btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
> [ 4945.283326] Hardware name: OptiPlex 780 
> [ 4945.288629] Modules linked in: i915 fbcon tileblit font
> snd_hda_codec_analog bitblit btrfs softcursor drm_kms_helper drm
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd zlib_deflate
> psmouse crc32c i2c_algo_bit soundcore snd_page_alloc libcrc32c ppdev
> intel_agp parport_pc lp intel_gtt dell_wmi video parport serio_raw
> sparse_keymap dcdbas r8169 mii
> [ 4945.320862] Pid: 581, comm: btrfs-transacti Not tainted 2.6.39-rc1 #1
> [ 4945.327282] Call Trace:
> [ 4945.329722]  [] warn_slowpath_common+0x7f/0xc0
> [ 4945.335755]  [] warn_slowpath_null+0x1a/0x20
> [ 4945.341602]  [] btrfs_orphan_commit_root+0xb0/0xc0
> [btrfs]
> [ 4945.348636]  [] commit_fs_roots+0xa1/0x140 [btrfs]
> [ 4945.355017]  [] btrfs_commit_transaction
> +0x349/0x750 [btrfs]
> [ 4945.362259]  [] ? wake_up_bit+0x40/0x40
> [ 4945.367649]  [] transaction_kthread+0x283/0x290
> [btrfs]
> [ 4945.374460]  [] ? btrfs_bio_wq_end_io+0x90/0x90
> [btrfs]
> [ 4945.381269]  [] ? btrfs_bio_wq_end_io+0x90/0x90
> [btrfs]
> [ 4945.388080]  [] kthread+0x96/0xa0
> [ 4945.392989]  [] kernel_thread_helper+0x4/0x10
> [ 4945.398895]  [] ? __init_kthread_worker+0x40/0x40
> [ 4945.405183]  [] ? gs_change+0x13/0x13
> [ 4945.410409] ---[ end trace 87e42fb6fdfeba78 ]---
> [ 5159.980447] [ cut here ]
> [ 5159.985072] WARNING: at fs/btrfs/inode.c:2177
> btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
> [ 5159.993353] Hardware name: OptiPlex 780 
> [ 5159.998648] Modules linked in: i915 fbcon tileblit font
> snd_hda_codec_analog bitblit btrfs softcursor drm_kms_helper drm
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd zlib_deflate
> psmouse crc32c i2c_algo_bit soundcore snd_page_alloc libcrc32c ppdev
> intel_agp parport_pc lp intel_gtt dell_wmi video parport serio_raw
> sparse_keymap dcdbas r8169 mii
> [ 5160.033593] Pid: 581, comm: btrfs-transacti Tainted: GW
> 2.6.39-rc1 #1
> [ 5160.041904] Call Trace:
> [ 5160.045326]  [] warn_slowpath_common+0x7f/0xc0
> [ 5160.052312]  [] warn_slowpath_null+0x1a/0x20
> [ 5160.059102]  [] btrfs_orphan_commit_root+0xb0/0xc0
> [btrfs]
> [ 5160.067143]  [] commit_fs_roots+0xa1/0x140 [btrfs]
> [ 5160.074485]  [] btrfs_commit_transaction
> +0x349/0x750 [btrfs]
> [ 5160.082687]  [] ? wake_up_bit+0x40/0x40
> [ 5160.089034]  [] transaction_kthread+0x283/0x290
> [btrfs]
> [ 5160.096811]  [] ? btrfs_bio_wq_end_io+0x90/0x90
> [btrfs]
> [ 5160.104568]  [] ? btrfs_bio_wq_end_io+0x90/0x90
> [btrfs]
> [ 5160.112294]  [] kthread+0x96/0xa0
> [ 5160.118093]  [] kernel_thread_helper+0x4/0x10
> [ 5160.124964]  [] ? __init_kthread_worker+0x40/0x40
> [ 5160.132166]  [] ? gs_change+0x13/0x13
> [ 5160.138292] ---[ end trace 87e42fb6fdfeba79 ]---
> [ 5434.340486] [ cut here ]
> [ 5434.346091] WARNING: at fs/btrfs/inode.c:2177
> btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
> [ 5434.355303] Hardware name: OptiPlex 780 
> [ 5434.361640] Modules linked in: i915 fbcon tileblit font
> snd_hda_codec_analog bitblit btrfs softcursor drm_kms_helper drm
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd zlib_deflate
> psmouse crc32c i2c_algo_bit soundcore snd_page_alloc libcrc32c ppdev
> intel_agp parport_pc lp intel_gtt dell_wmi video parport serio_raw
> sparse_keymap dcdbas r8169 mii
> [ 5434.396772] Pid: 581, comm: btrfs-transacti Tainted: GW
> 2.6.39-rc1 #1
> [ 5434.405087] Call Trace:
> [ 5434.408546]  [] warn_slowpath_common+0x7f/0xc0
> [ 5434.415523]  [] warn_slowpath_null+0x1a/0x20
> [ 5434.422300]  [] btrfs_orphan_commit_root+0xb0/0xc0
> [btrfs]
> [ 5434.430290]  [] commit_fs_roots+0xa1/0x140 [btrfs]
> [ 5434.437562]  [] btrfs_commit_transaction
> +0x349/0x750 [btrfs]
> [ 5434.445738]  [] ? wake_up_bit+0x40/0x40
> [ 5434.452087]  [] transaction_kthread+0x283/0x290
> [btrfs]
> [ 5434.459792]  [] ? btrfs_bio_wq_end_io+0x90/0x90
> [btrfs]
> [ 5434.467542]  [] ? btrfs_bio_wq_end_io+0x90/0x90
> [btrfs]
> [ 5434.475291]  [] kthread+0x96/0xa0
> [ 5434.481082]  [] kernel_thread_helper+0x4/0x10
> [ 5434.487949]  [] ? __init_kthread_worker+0x40/0x40
> [ 5434.495160]  [] ? gs_change+0x13/0x13
> [ 5434.501314] ---[ end trace 87e42fb6fdfeba7a ]---
> [ 5525.450514] --

2.6.39-rc1: btrfs "WARNING: at fs/btrfs/inode.c:2177"

2011-04-07 Thread Jeff Wu

Hi , 
I run iozone stress test on a ceph client for x86_64, ceph 0.26 +
linux-2.6.39-rc1 server,
printk "WARNING: at fs/btrfs/inode.c:2177"

1.log1 :

...
[ 1663.370008] CE: hpet2 increased min_delta_ns to 7500 nsec
[ 1663.375399] CE: hpet2 increased min_delta_ns to 11250 nsec
[ 4945.270388] [ cut here ]
[ 4945.275011] WARNING: at fs/btrfs/inode.c:2177
btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[ 4945.283326] Hardware name: OptiPlex 780 
[ 4945.288629] Modules linked in: i915 fbcon tileblit font
snd_hda_codec_analog bitblit btrfs softcursor drm_kms_helper drm
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd zlib_deflate
psmouse crc32c i2c_algo_bit soundcore snd_page_alloc libcrc32c ppdev
intel_agp parport_pc lp intel_gtt dell_wmi video parport serio_raw
sparse_keymap dcdbas r8169 mii
[ 4945.320862] Pid: 581, comm: btrfs-transacti Not tainted 2.6.39-rc1 #1
[ 4945.327282] Call Trace:
[ 4945.329722]  [] warn_slowpath_common+0x7f/0xc0
[ 4945.335755]  [] warn_slowpath_null+0x1a/0x20
[ 4945.341602]  [] btrfs_orphan_commit_root+0xb0/0xc0
[btrfs]
[ 4945.348636]  [] commit_fs_roots+0xa1/0x140 [btrfs]
[ 4945.355017]  [] btrfs_commit_transaction
+0x349/0x750 [btrfs]
[ 4945.362259]  [] ? wake_up_bit+0x40/0x40
[ 4945.367649]  [] transaction_kthread+0x283/0x290
[btrfs]
[ 4945.374460]  [] ? btrfs_bio_wq_end_io+0x90/0x90
[btrfs]
[ 4945.381269]  [] ? btrfs_bio_wq_end_io+0x90/0x90
[btrfs]
[ 4945.388080]  [] kthread+0x96/0xa0
[ 4945.392989]  [] kernel_thread_helper+0x4/0x10
[ 4945.398895]  [] ? __init_kthread_worker+0x40/0x40
[ 4945.405183]  [] ? gs_change+0x13/0x13
[ 4945.410409] ---[ end trace 87e42fb6fdfeba78 ]---
[ 5159.980447] [ cut here ]
[ 5159.985072] WARNING: at fs/btrfs/inode.c:2177
btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[ 5159.993353] Hardware name: OptiPlex 780 
[ 5159.998648] Modules linked in: i915 fbcon tileblit font
snd_hda_codec_analog bitblit btrfs softcursor drm_kms_helper drm
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd zlib_deflate
psmouse crc32c i2c_algo_bit soundcore snd_page_alloc libcrc32c ppdev
intel_agp parport_pc lp intel_gtt dell_wmi video parport serio_raw
sparse_keymap dcdbas r8169 mii
[ 5160.033593] Pid: 581, comm: btrfs-transacti Tainted: GW
2.6.39-rc1 #1
[ 5160.041904] Call Trace:
[ 5160.045326]  [] warn_slowpath_common+0x7f/0xc0
[ 5160.052312]  [] warn_slowpath_null+0x1a/0x20
[ 5160.059102]  [] btrfs_orphan_commit_root+0xb0/0xc0
[btrfs]
[ 5160.067143]  [] commit_fs_roots+0xa1/0x140 [btrfs]
[ 5160.074485]  [] btrfs_commit_transaction
+0x349/0x750 [btrfs]
[ 5160.082687]  [] ? wake_up_bit+0x40/0x40
[ 5160.089034]  [] transaction_kthread+0x283/0x290
[btrfs]
[ 5160.096811]  [] ? btrfs_bio_wq_end_io+0x90/0x90
[btrfs]
[ 5160.104568]  [] ? btrfs_bio_wq_end_io+0x90/0x90
[btrfs]
[ 5160.112294]  [] kthread+0x96/0xa0
[ 5160.118093]  [] kernel_thread_helper+0x4/0x10
[ 5160.124964]  [] ? __init_kthread_worker+0x40/0x40
[ 5160.132166]  [] ? gs_change+0x13/0x13
[ 5160.138292] ---[ end trace 87e42fb6fdfeba79 ]---
[ 5434.340486] [ cut here ]
[ 5434.346091] WARNING: at fs/btrfs/inode.c:2177
btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[ 5434.355303] Hardware name: OptiPlex 780 
[ 5434.361640] Modules linked in: i915 fbcon tileblit font
snd_hda_codec_analog bitblit btrfs softcursor drm_kms_helper drm
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd zlib_deflate
psmouse crc32c i2c_algo_bit soundcore snd_page_alloc libcrc32c ppdev
intel_agp parport_pc lp intel_gtt dell_wmi video parport serio_raw
sparse_keymap dcdbas r8169 mii
[ 5434.396772] Pid: 581, comm: btrfs-transacti Tainted: GW
2.6.39-rc1 #1
[ 5434.405087] Call Trace:
[ 5434.408546]  [] warn_slowpath_common+0x7f/0xc0
[ 5434.415523]  [] warn_slowpath_null+0x1a/0x20
[ 5434.422300]  [] btrfs_orphan_commit_root+0xb0/0xc0
[btrfs]
[ 5434.430290]  [] commit_fs_roots+0xa1/0x140 [btrfs]
[ 5434.437562]  [] btrfs_commit_transaction
+0x349/0x750 [btrfs]
[ 5434.445738]  [] ? wake_up_bit+0x40/0x40
[ 5434.452087]  [] transaction_kthread+0x283/0x290
[btrfs]
[ 5434.459792]  [] ? btrfs_bio_wq_end_io+0x90/0x90
[btrfs]
[ 5434.467542]  [] ? btrfs_bio_wq_end_io+0x90/0x90
[btrfs]
[ 5434.475291]  [] kthread+0x96/0xa0
[ 5434.481082]  [] kernel_thread_helper+0x4/0x10
[ 5434.487949]  [] ? __init_kthread_worker+0x40/0x40
[ 5434.495160]  [] ? gs_change+0x13/0x13
[ 5434.501314] ---[ end trace 87e42fb6fdfeba7a ]---
[ 5525.450514] [ cut here ]
[ 5525.456066] WARNING: at fs/btrfs/inode.c:2177
btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
[ 5525.465286] Hardware name: OptiPlex 780 
[ 5525.471543] Modules linked in: i915 fbcon tileblit font
snd_hda_codec_analog bitblit btrfs softcursor drm_kms_helper drm
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd zlib_deflate
psmouse crc32c i2c_algo_bit soundcore snd_

Re: BUG: unable to handle kernel NULL pointer dereference at (null)

2011-04-07 Thread Johannes Hirte
On Wednesday 06 April 2011 19:15:41 Josef Bacik wrote:
> On Wed, Apr 06, 2011 at 01:10:38PM +0200, Johannes Hirte wrote:
> > On Tuesday 05 April 2011 23:57:53 Josef Bacik wrote:
> > > > Now it hit
> > > 
> > > Man I cannot catch a break.  I hope this is the last one.  Thanks,
> 
> Ok I give up, I just cleaned it all up and don't mark the pages as dirty
> unless we're actually going to succeed at writing them.  This should fix
> everything
> 
> ---
>  fs/btrfs/ctree.h|5 ++
>  fs/btrfs/file.c |   21 +++
>  fs/btrfs/free-space-cache.c |  117
> --- 3 files changed, 69
> insertions(+), 74 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 3458b57..0d00a07 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2576,6 +2576,11 @@ int btrfs_drop_extents(struct btrfs_trans_handle
> *trans, struct inode *inode, int btrfs_mark_extent_written(struct
> btrfs_trans_handle *trans,
> struct inode *inode, u64 start, u64 end);
>  int btrfs_release_file(struct inode *inode, struct file *file);
> +void btrfs_drop_pages(struct page **pages, size_t num_pages);
> +int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
> +   struct page **pages, size_t num_pages,
> +   loff_t pos, size_t write_bytes,
> +   struct extent_state **cached);
> 
>  /* tree-defrag.c */
>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index e621ea5..75899a0 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -104,7 +104,7 @@ static noinline int btrfs_copy_from_user(loff_t pos,
> int num_pages, /*
>   * unlocks pages after btrfs_file_write is done with them
>   */
> -static noinline void btrfs_drop_pages(struct page **pages, size_t
> num_pages) +void btrfs_drop_pages(struct page **pages, size_t num_pages)
>  {
>   size_t i;
>   for (i = 0; i < num_pages; i++) {
> @@ -127,16 +127,13 @@ static noinline void btrfs_drop_pages(struct page
> **pages, size_t num_pages) * this also makes the decision about creating
> an inline extent vs * doing real data extents, marking pages dirty and
> delalloc as required. */
> -static noinline int dirty_and_release_pages(struct btrfs_root *root,
> - struct file *file,
> - struct page **pages,
> - size_t num_pages,
> - loff_t pos,
> - size_t write_bytes)
> +int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
> +   struct page **pages, size_t num_pages,
> +   loff_t pos, size_t write_bytes,
> +   struct extent_state **cached)
>  {
>   int err = 0;
>   int i;
> - struct inode *inode = fdentry(file)->d_inode;
>   u64 num_bytes;
>   u64 start_pos;
>   u64 end_of_last_block;
> @@ -149,7 +146,7 @@ static noinline int dirty_and_release_pages(struct
> btrfs_root *root,
> 
>   end_of_last_block = start_pos + num_bytes - 1;
>   err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
> - NULL);
> + cached);
>   if (err)
>   return err;
> 
> @@ -992,9 +989,9 @@ static noinline ssize_t __btrfs_buffered_write(struct
> file *file, }
> 
>   if (copied > 0) {
> - ret = dirty_and_release_pages(root, file, pages,
> -   dirty_pages, pos,
> -   copied);
> + ret = btrfs_dirty_pages(root, inode, pages,
> + dirty_pages, pos, copied,
> + NULL);
>   if (ret) {
>   btrfs_delalloc_release_space(inode,
>   dirty_pages << PAGE_CACHE_SHIFT);
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index f561c95..a3f420d 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -508,6 +508,7 @@ int btrfs_write_out_cache(struct btrfs_root *root,
>   struct inode *inode;
>   struct rb_node *node;
>   struct list_head *pos, *n;
> + struct page **pages;
>   struct page *page;
>   struct extent_state *cached_state = NULL;
>   struct btrfs_free_cluster *cluster = NULL;
> @@ -517,13 +518,13 @@ int btrfs_write_out_cache(struct btrfs_root *root,
>   u64 start, end, len;
>   u64 bytes = 0;
>   u32 *crc, *checksums;
> - pgoff_t index = 0, last_index = 0;
>   unsigned long first_page_offset;
> - int num_checksums;
> + int index = 0, num_pages = 0;
>   int entries = 0;
>   int bitmaps = 0;
>   

[RFC][PATCH] Btrfs: about chunk tree backups

2011-04-07 Thread WuBo
hi,all

I've been diging into the idea of chunk tree backups. Here is the 
predesign, before finishing chunk alloc, the first block in this 
chunk will be written in some information, these information will be 
useful for chunk tree rebuilding if crash, also the first block will 
be moved into fs_info->freed_extents[2], just as the super block.
what we should do is making some changes in these functions:
btrfs_make_block_group
btrfs_read_block_groups
btrfs_remove_block_group  
what do you think about it?

There's something strait with backward compatibility. The mkfs.btrfs
has been made several chunks when creating the fs. It also need to do 
the same thing as above. But it will be confusing in some situations 
such as old fs mount on new kernel. I think it's better to add a 
incompat flag in super block to mark weather the fs is formaten with
new mkfs.btrfs.

if that's OK, TODOLIST:
-design the information on chunk's first block to make it uniqueness
-backward compatibility handle(for example:fix mkfs.btrfs)

Signed-off-by: Wu Bo 
---
 fs/btrfs/ctree.h   |   13 +++-
 fs/btrfs/extent-tree.c |  135 +-
 fs/btrfs/volumes.c |  168 
 fs/btrfs/volumes.h |   25 +++
 4 files changed, 322 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b4b9d1..580dd1c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -41,6 +41,7 @@ extern struct kmem_cache *btrfs_transaction_cachep;
 extern struct kmem_cache *btrfs_bit_radix_cachep;
 extern struct kmem_cache *btrfs_path_cachep;
 struct btrfs_ordered_sum;
+struct map_lookup;
 
 #define BTRFS_MAGIC "_BHRfS_M"
 
@@ -408,6 +409,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL  (1ULL << 1)
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS(1ULL << 2)
 #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO(1ULL << 3)
+#define BTRFS_FEATURE_INCOMPAT_CHUNK_TREE_BACKUP (1ULL << 4)
 
 #define BTRFS_FEATURE_COMPAT_SUPP  0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SUPP   0ULL
@@ -415,7 +417,8 @@ struct btrfs_super_block {
(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \
 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |\
 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |  \
-BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)
+BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |  \
+BTRFS_FEATURE_INCOMPAT_CHUNK_TREE_BACKUP)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
@@ -2172,10 +2175,12 @@ int btrfs_extent_readonly(struct btrfs_root *root, u64 
bytenr);
 int btrfs_free_block_groups(struct btrfs_fs_info *info);
 int btrfs_read_block_groups(struct btrfs_root *root);
 int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr);
+
 int btrfs_make_block_group(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root, u64 bytes_used,
-  u64 type, u64 chunk_objectid, u64 chunk_offset,
-  u64 size);
+  struct btrfs_root *root, struct map_lookup *map,
+  u64 bytes_used, u64 type, u64 chunk_objectid,
+  u64 chunk_offset, u64 size);
+
 int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 struct btrfs_root *root, u64 group_start);
 u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f1db57d..27ea7d5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "compat.h"
 #include "hash.h"
 #include "ctree.h"
@@ -231,6 +232,113 @@ static int exclude_super_stripes(struct btrfs_root *root,
return 0;
 }
 
+static int exclude_chunk_stripes_header_slow(struct btrfs_root *root,
+   struct btrfs_block_group_cache *cache)
+{
+   int i;
+   int nr;
+   u64 devid;
+   u64 physical;
+   int stripe_len;
+   u64 stripe_num;
+   u64 *logical;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct btrfs_chunk *chunk;
+   struct btrfs_key found_key;
+   struct extent_buffer *leaf;
+   int ret;
+
+   ret = 0;
+   path = btrfs_alloc_path();
+   if (!path)
+   return -1;
+
+   root = root->fs_info->chunk_root;
+
+   key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+   key.offset = cache->key.objectid;
+   key.type = BTRFS_CHUNK_ITEM_KEY;
+
+   ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+   if (ret != 0)
+   goto error;
+
+   btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]);
+
+   if (found_key.objectid != BTRFS_FIRST_CHUNK_TREE_OBJECTID ||
+   btrfs_key_type(&found_key) != BTRFS_CHUNK_ITEM_KEY ||
+   found_key.offset != cache->key.objec

2.6.29-rc2 oops and assertion failure...

2011-04-07 Thread Daniel J Blueman
When running a practical stress-test on 2.6.29-rc2 trying to reproduce
an older (extent refcounting) issue, I am consistently able to hit an
oops [1] and an assertion failure [2].

Here, I'm testing with 8 block ramdisks, configured in the kernel to
256MB each (intentionally testing free-space handling):

for i in `seq 0 7`; do mknod /dev/ram$i b 1 $i; dd if=/dev/zero
of=/dev/ram$i bs=1024k count=256; done
mkfs.btrfs -m raid10 -d raid10 /dev/ram0 /dev/ram1 /dev/ram2 /dev/ram3
/dev/ram4 /dev/ram5 /dev/ram6 /dev/ram7
mount /dev/ram0 /mnt -o space_cache,ssd,nobarrier,compress # try
without compress also
cp -xa / /mnt

the next steps are executed in parallel:

while :; do cp -xa / /mnt; done &
while :; do btrfs filesystem balance /mnt; done &
while :; do find /mnt -print0 | xargs -0 btrfs filesystem defragment -c; done &

--- [1]

general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/bus/hid/drivers/generic-usb/new_id
CPU 0
Modules linked in: brd loop [last unloaded: brd]

Pid: 28000, comm: btrfs Tainted: GW   2.6.39-rc2-350cd #2
Supermicro X8STi/X8STi
RIP: 0010:[]  []
btrfs_write_out_cache+0x9d4/0xdf0
RSP: 0018:8802af913968  EFLAGS: 00010246
RAX: db738800 RBX:  RCX: 0200
RDX: 1000 RSI: 8802ba9b1048 RDI: db738800
RBP: 8802af913ae8 R08: 0001 R09: 
R10: 810e8130 R11:  R12: 8802510a3be0
R13: 8802acf8b948 R14: 8802510a3bb0 R15: 8802b9f561c8
FS:  7fabcef8d740() GS:88031fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 03c960c8 CR3: 0002afa29000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 28000, threadinfo 8802af912000, task 8803090d)
Stack:
 8802af913988 0001 8802af913998 8802af913a88
 8802af9139a8  0040 880215059770
 880215059710 0010 8802acf8b908 000f
Call Trace:
 [] ? get_parent_ip+0x11/0x50
 [] ? sub_preempt_count+0x9d/0xd0
 [] btrfs_write_dirty_block_groups+0x2ab/0x300
 [] commit_cowonly_roots+0x105/0x1e0
 [] btrfs_commit_transaction+0x37d/0x720
 [] ? wake_up_bit+0x40/0x40
 [] relocate_block_group+0x4bc/0x600
 [] btrfs_relocate_block_group+0x1a8/0x2d0
 [] btrfs_relocate_chunk+0x6d/0x3b0
 [] ? get_parent_ip+0x11/0x50
 [] ? sub_preempt_count+0x9d/0xd0
 [] btrfs_balance+0x20d/0x280
 [] btrfs_ioctl+0x450/0x590
 [] do_vfs_ioctl+0x8d/0x330
 [] ? fget_light+0x274/0x3c0
 [] ? __do_fault+0x150/0x5d0
 [] sys_ioctl+0x4a/0x80
 [] system_call_fastpath+0x16/0x1b
Code: 89 ad 38 ff ff ff 49 89 c7 4c 8b ad 48 ff ff ff e9 e4 00 00 00
66 90 40 f6 c7 04 0f 85 6e 01 00 00 89 d1 c1 e9 03 f6 c2 04 89 c9 
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP  [] btrfs_write_out_cache+0x9d4/0xdf0
 RSP 
---[ end trace a7919e7f17c0a728 ]---

--- [2]

kernel BUG at fs/btrfs/relocation.c:4282!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent
CPU 0
Modules linked in: brd loop

Pid: 7775, comm: flush-btrfs-1 Tainted: GW   2.6.39-rc2-350cd
#2 Supermicro X8STi/X8STi
RIP: 0010:[]  []
btrfs_reloc_cow_block+0x28b/0x2c0
RSP: 0018:8803057817f0  EFLAGS: 00010246
RAX: 880305728000 RBX: 88030564 RCX: 880235d92e40
RDX: 880209c1f5f0 RSI: 880308bdd168 RDI: 8802ff1fb220
RBP: 880305781850 R08:  R09: 0001
R10: 812d8630 R11:  R12: 880308bdd168
R13: 880209c1f5f0 R14: 8802ff1fb220 R15: 
FS:  () GS:88031fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f7355663650 CR3: 0001f75f7000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process flush-btrfs-1 (pid: 7775, threadinfo 88030578, task
880308fa8000)
Stack:
 880305781850 81276c5d fff7 ea0006cd7e90
  880235d92e40 880305781850 880308bdd168
 880235d92e40 880209c1f5f0 8802ff1fb220 
Call Trace:
 [] ? update_ref_for_cow+0x26d/0x360
 [] __btrfs_cow_block+0x6b1/0x980
 [] btrfs_cow_block+0x11b/0x2c0
 [] btrfs_search_slot+0x3c5/0x790
 [] ? btrfs_alloc_path+0x15/0x30
 [] btrfs_truncate_inode_items+0x110/0x770
 [] ? get_parent_ip+0x11/0x50
 [] ? _raw_spin_unlock+0x30/0x60
 [] btrfs_evict_inode+0x18b/0x200
 [] evict+0x81/0x180
 [] iput_final+0xe6/0x1a0
 [] iput+0x36/0x50
 [] writeback_sb_inodes+0x12e/0x1d0
 [] writeback_inodes_wb+0x7b/0x180
 [] wb_writeback+0x2bb/0x320
 [] ? get_nr_inodes+0x62/0xb0
 [] wb_do_writeback+0x21c/0x230
 [] bdi_writeback_thread+0x92/0x180
 [] ? wb_do_writeback+0x23