machine gets unresponsive during btrfs balance

2010-08-12 Thread Andreas Philipp
Hi,

I am using a btrfs filesystem created with raid0 for data and metadata
for (temporary) storage of tv recordings from my vdr. The filesystem was
created under kernel version 2.6.34. An initial btrfs balance command
succeeded. Since I upgraded to 2.6.35-rcX and 2.6.35 btrfs balance no
longer finishes but puts the machine in some unresponsive state.
Unfortunately, I do not see any kernel oops or other debug information
because even the display freezes. The last thing that happens are that
those two lines are written to /var/log/messages:
Aug 11 21:42:23 thor kernel: btrfs: found 62911 extents
Aug 11 21:42:24 thor kernel: btrfs: relocating block group 1723913469952
flags 9
After that the machine becomes immediately unresponsive.

As I did not see anything that might be related to my problem in the
changelog for 2.6.35.1 I did not try again with this version.

Thanks,
Andreas

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: machine gets unresponsive during btrfs balance

2010-08-12 Thread Yan, Zheng
On Thu, Aug 12, 2010 at 3:14 PM, Andreas Philipp
philipp.andr...@gmail.com wrote:
 Hi,

 I am using a btrfs filesystem created with raid0 for data and metadata
 for (temporary) storage of tv recordings from my vdr. The filesystem was
 created under kernel version 2.6.34. An initial btrfs balance command
 succeeded. Since I upgraded to 2.6.35-rcX and 2.6.35 btrfs balance no
 longer finishes but puts the machine in some unresponsive state.
 Unfortunately, I do not see any kernel oops or other debug information
 because even the display freezes. The last thing that happens are that
 those two lines are written to /var/log/messages:
 Aug 11 21:42:23 thor kernel: btrfs: found 62911 extents
 Aug 11 21:42:24 thor kernel: btrfs: relocating block group 1723913469952
 flags 9
 After that the machine becomes immediately unresponsive.

 As I did not see anything that might be related to my problem in the
 changelog for 2.6.35.1 I did not try again with this version.


Do you have more than one machines? would you please setup netconsole
to see what happen.

Thanks
Yan, Zheng
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can't unmount

2010-08-12 Thread C Anthony Risinger
On Aug 12, 2010, at 10:58 AM, K. Richard Pixley r...@noir.com wrote:

 And should I be worried about what umount -l might be leaving
 behind?  (eg, any unfreed kernel resources)  Or is that a reasonable
 way to deal with this situation on an ongoing basis?

 --rich

 On 8/12/10 08:55 , K. Richard Pixley wrote:
 I'm running into a situation where I can't unmount a mounted
 snapshot.  It shows busy even though neither lsof nor fuser show
 any open files.  Umount -f doesn't work although umount -l does.

 Is there anything else I can do to debug this scenario or to clear
 the busy status myself?  Or am I down to rebooting each time?

 This is on stock ubuntu-10.04, x86.

 --rich

You are lazy unmounting, as I understand, you are essentially just
hiding the fact that the mount was busy to userspace... The mount will
remain active in the kernel until you resolve whatever was stopping
umount in the first place; kernel will then silently unmount.

Does this affect all of your mounted snapshots, or only a particular
one?

C Anthony [mobile]
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can't unmount

2010-08-12 Thread K. Richard Pixley

 On 8/12/10 10:46 , C Anthony Risinger wrote:

On Aug 12, 2010, at 10:58 AM, K. Richard Pixleyr...@noir.com  wrote:

And should I be worried about what umount -l might be leaving
behind?  (eg, any unfreed kernel resources)  Or is that a reasonable
way to deal with this situation on an ongoing basis?

On 8/12/10 08:55 , K. Richard Pixley wrote:

I'm running into a situation where I can't unmount a mounted
snapshot.  It shows busy even though neither lsof nor fuser show
any open files.  Umount -f doesn't work although umount -l does.

Is there anything else I can do to debug this scenario or to clear
the busy status myself?  Or am I down to rebooting each time?

This is on stock ubuntu-10.04, x86.

You are lazy unmounting, as I understand, you are essentially just
hiding the fact that the mount was busy to userspace... The mount will
remain active in the kernel until you resolve whatever was stopping
umount in the first place; kernel will then silently unmount.

Understood.

Does this affect all of your mounted snapshots, or only a particular
one?
I'm only mounting one at a time so I haven't noticed.  Will check next 
time it occurs.


--rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 PATCH 1/6] Btrfs: Add experimental hot data hash list index

2010-08-12 Thread bchociej
From: Ben Chociej bchoc...@gmail.com

Adds a hash table structure to efficiently lookup the data temperature
of a file. Also adds a function to calculate that temperature based on
some metrics kept in custom frequency data structs (in the next patch).

Signed-off-by: Ben Chociej bchoc...@gmail.com
Signed-off-by: Matt Lupfer mlup...@gmail.com
Signed-off-by: Conor Scott consc...@vt.edu
Reviewed-by: Mingming Cao c...@us.ibm.com
---
 fs/btrfs/hotdata_hash.c |  338 +++
 fs/btrfs/hotdata_hash.h |  155 ++
 2 files changed, 493 insertions(+), 0 deletions(-)
 create mode 100644 fs/btrfs/hotdata_hash.c
 create mode 100644 fs/btrfs/hotdata_hash.h

diff --git a/fs/btrfs/hotdata_hash.c b/fs/btrfs/hotdata_hash.c
new file mode 100644
index 000..b789edd
--- /dev/null
+++ b/fs/btrfs/hotdata_hash.c
@@ -0,0 +1,338 @@
+/*
+ * fs/btrfs/hotdata_hash.c
+ *
+ * Copyright (C) 2010 International Business Machines Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include linux/list.h
+#include linux/err.h
+#include linux/slab.h
+#include linux/module.h
+#include linux/spinlock.h
+#include linux/hardirq.h
+#include linux/hash.h
+#include linux/kthread.h
+#include linux/freezer.h
+#include hotdata_map.h
+#include hotdata_hash.h
+#include hotdata_relocate.h
+#include async-thread.h
+#include ctree.h
+
+struct heat_hashlist_node *alloc_heat_hashlist_node(gfp_t mask)
+{
+   struct heat_hashlist_node *node;
+
+   node = kmalloc(sizeof(struct heat_hashlist_node), mask);
+   if (!node || IS_ERR(node))
+   return node;
+   INIT_HLIST_NODE(node-hashnode);
+   node-freq_data = NULL;
+   node-hlist = NULL;
+   node-location = BTRFS_ON_ROTATING;
+   spin_lock_init(node-lock);
+   spin_lock_init(node-location_lock);
+   atomic_set(node-refs, 1);
+
+   return node;
+}
+
+void free_heat_hashlists(struct btrfs_root *root)
+{
+   int i;
+
+   /* Free node/range heat hash lists */
+   for (i = 0; i  HEAT_HASH_SIZE; i++) {
+   struct hlist_node *pos = NULL, *pos2 = NULL;
+   struct heat_hashlist_node *heatnode = NULL;
+
+   hlist_for_each_safe(pos, pos2,
+   root-heat_inode_hl[i].hashhead) {
+   heatnode = hlist_entry(pos, struct heat_hashlist_node,
+   hashnode);
+   hlist_del(pos);
+   kfree(heatnode);
+   }
+   hlist_for_each_safe(pos, pos2,
+   root-heat_range_hl[i].hashhead) {
+   heatnode = hlist_entry(pos, struct heat_hashlist_node,
+   hashnode);
+   hlist_del(pos);
+   kfree(heatnode);
+   }
+   }
+}
+
+/*
+ * btrfs_get_temp is responsible for distilling the six heat criteria, which
+ * are described in detail in hotdata_hash.h) down into a single temperature
+ * value for the data, which is an integer between 0 and HEAT_MAX_VALUE.
+ *
+ * To accomplish this, the raw values from the btrfs_freq_data structure
+ * are shifted various ways in order to make the temperature calculation more
+ * or less sensitive to each value.
+ *
+ * Once this calibration has happened, we do some additional normalization and
+ * make sure that everything fits nicely in a u32. From there, we take a very
+ * rudimentary kind of average of each of the values, where the *_COEFF_POWER
+ * values act as weights for the average.
+ *
+ * Finally, we use the HEAT_HASH_BITS value, which determines the size of the
+ * heat hash list, to normalize the temperature to the proper granularity.
+ */
+int btrfs_get_temp(struct btrfs_freq_data *fdata)
+{
+   u32 result = 0;
+
+   struct timespec ckt = current_kernel_time();
+   u64 cur_time = timespec_to_ns(ckt);
+
+   u32 nrr_heat = fdata-nr_reads  NRR_MULTIPLIER_POWER;
+   u32 nrw_heat = fdata-nr_writes  NRW_MULTIPLIER_POWER;
+
+   u64 ltr_heat = (cur_time - timespec_to_ns(fdata-last_read_time))
+LTR_DIVIDER_POWER;
+   u64 ltw_heat = (cur_time - timespec_to_ns(fdata-last_write_time))
+LTW_DIVIDER_POWER;
+
+   u64 avr_heat = (((u64) -1) - fdata-avg_delta_reads)
+AVR_DIVIDER_POWER;
+   u64 

[RFC v2 PATCH 2/6] Btrfs: Add data structures for hot data tracking

2010-08-12 Thread bchociej
From: Ben Chociej bchoc...@gmail.com

Adds hot_inode_tree and hot_range_tree structs to keep track of
frequently accessed files and ranges within files. Trees contain
hot_{inode,range}_items representing those files and ranges, each of
which contains a btrfs_freq_data struct with its frequency of access
metrics (number of {reads, writes}, last {read,write} time, frequency of
{reads,writes}).

Having these trees means that Btrfs can quickly determine the
temperature of some data by doing some calculations on the
btrfs_freq_data struct that hangs off of the tree item.

Also, since it isn't entirely obvious, the frequency or reads or
writes is determined by taking a kind of generalized average of the last
few (2^N for some tunable N) reads or writes.

Signed-off-by: Ben Chociej bchoc...@gmail.com
Signed-off-by: Matt Lupfer mlup...@gmail.com
Signed-off-by: Conor Scott consc...@vt.edu
Reviewed-by: Mingming Cao c...@us.ibm.com
---
 fs/btrfs/hotdata_map.c |  804 
 fs/btrfs/hotdata_map.h |  167 ++
 2 files changed, 971 insertions(+), 0 deletions(-)
 create mode 100644 fs/btrfs/hotdata_map.c
 create mode 100644 fs/btrfs/hotdata_map.h

diff --git a/fs/btrfs/hotdata_map.c b/fs/btrfs/hotdata_map.c
new file mode 100644
index 000..ddae0c4
--- /dev/null
+++ b/fs/btrfs/hotdata_map.c
@@ -0,0 +1,804 @@
+/*
+ * fs/btrfs/hotdata_map.c
+ *
+ * Copyright (C) 2010 International Business Machines Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include linux/err.h
+#include linux/slab.h
+#include linux/module.h
+#include linux/spinlock.h
+#include linux/hardirq.h
+#include linux/blkdev.h
+#include ctree.h
+#include hotdata_map.h
+#include hotdata_hash.h
+#include btrfs_inode.h
+#include volumes.h
+
+/* kmem_cache pointers for slab caches */
+static struct kmem_cache *hot_inode_item_cache;
+static struct kmem_cache *hot_range_item_cache;
+
+static struct hot_inode_item *btrfs_update_inode_freq(struct btrfs_inode 
*inode,
+  int create);
+
+static int btrfs_update_range_freq(struct hot_inode_item *he,
+  u64 off, u64 len, int create,
+  struct btrfs_root *root);
+
+/* init hot_inode_item kmem cache */
+int __init hot_inode_item_init(void)
+{
+   hot_inode_item_cache = kmem_cache_create(hot_inode_item,
+   sizeof(struct hot_inode_item), 0,
+   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   if (!hot_inode_item_cache)
+   return -ENOMEM;
+   return 0;
+}
+
+/* init hot_range_item kmem cache */
+int __init hot_range_item_init(void)
+{
+   hot_range_item_cache = kmem_cache_create(hot_range_item,
+   sizeof(struct hot_range_item), 0,
+   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   if (!hot_range_item_cache)
+   return -ENOMEM;
+   return 0;
+}
+
+void hot_inode_item_exit(void)
+{
+   if (hot_inode_item_cache)
+   kmem_cache_destroy(hot_inode_item_cache);
+}
+
+void hot_range_item_exit(void)
+{
+   if (hot_range_item_cache)
+   kmem_cache_destroy(hot_range_item_cache);
+}
+
+/*
+ * Initialize the inode tree. Should be called for each new inode
+ * access or other user of the hot_inode interface.
+ */
+void hot_inode_tree_init(struct hot_inode_tree *tree)
+{
+   tree-map = RB_ROOT;
+   rwlock_init(tree-lock);
+}
+
+/*
+ * Initialize the hot range tree. Should be called for each new inode
+ * access or other user of the hot_range interface.
+ */
+void hot_range_tree_init(struct hot_range_tree *tree)
+{
+   tree-map = RB_ROOT;
+   rwlock_init(tree-lock);
+}
+
+/*
+ * Allocate a new hot_inode_item structure. The new structure is
+ * returned with a reference count of one and needs to be
+ * freed using free_inode_item()
+ */
+struct hot_inode_item *alloc_hot_inode_item(unsigned long ino)
+{
+   struct hot_inode_item *he;
+   he = kmem_cache_alloc(hot_inode_item_cache, GFP_KERNEL | GFP_NOFS);
+   if (!he || IS_ERR(he))
+   return he;
+
+   atomic_set(he-refs, 1);
+   he-in_tree = 0;
+   he-i_ino = ino;
+   he-heat_node = alloc_heat_hashlist_node(GFP_KERNEL | GFP_NOFS);
+   he-heat_node-freq_data = he-freq_data;

[RFC v2 PATCH 5/6] Btrfs: 3 new ioctls related to hot data features

2010-08-12 Thread bchociej
From: Ben Chociej bchoc...@gmail.com

BTRFS_IOC_GET_HEAT_INFO: return a struct containing the various
metrics collected in btrfs_freq_data structs, and also return a
calculated data temperature based on those metrics. Optionally, retrieve
the temperature from the hot data hash list instead of recalculating it.

BTRFS_IOC_GET_HEAT_OPTS: return an integer representing the current
state of hot data tracking and migration:

0 = do nothing
1 = track frequency of access
2 = migrate data to fast media based on temperature (not implemented)

BTRFS_IOC_SET_HEAT_OPTS: change the state of hot data tracking and
migration, as described above.

Signed-off-by: Ben Chociej bchoc...@gmail.com
Signed-off-by: Matt Lupfer mlup...@gmail.com
Signed-off-by: Conor Scott consc...@vt.edu
Reviewed-by: Mingming Cao c...@us.ibm.com
---
 fs/btrfs/ioctl.c |  142 +-
 fs/btrfs/ioctl.h |   23 +
 2 files changed, 164 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4dbaf89..88cd0e7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -49,6 +49,8 @@
 #include print-tree.h
 #include volumes.h
 #include locking.h
+#include hotdata_map.h
+#include hotdata_hash.h
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -1869,7 +1871,7 @@ static long btrfs_ioctl_default_subvol(struct file *file, 
void __user *argp)
return 0;
 }
 
-long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
+static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
 {
struct btrfs_ioctl_space_args space_args;
struct btrfs_ioctl_space_info space;
@@ -1974,6 +1976,138 @@ long btrfs_ioctl_trans_end(struct file *file)
return 0;
 }
 
+/*
+ * Retrieve information about access frequency for the given file. Return it in
+ * a userspace-friendly struct for btrfsctl (or another tool) to parse.
+ *
+ * The temperature that is returned can be live -- that is, recalculated when
+ * the ioctl is called -- or it can be returned from the hashtable, reflecting
+ * the (possibly old) value that the system will use when considering files
+ * for migration. This behavior is determined by heat_info-live.
+ */
+static long btrfs_ioctl_heat_info(struct file *file, void __user *argp)
+{
+   struct inode *mnt_inode = fdentry(file)-d_inode;
+   struct inode *file_inode;
+   struct file *file_filp;
+   struct btrfs_root *root = BTRFS_I(mnt_inode)-root;
+   struct btrfs_ioctl_heat_info *heat_info;
+   struct hot_inode_tree *hitree;
+   struct hot_inode_item *he;
+   int ret;
+
+   heat_info = kmalloc(sizeof(struct btrfs_ioctl_heat_info),
+   GFP_KERNEL | GFP_NOFS);
+
+   if (copy_from_user((void *) heat_info,
+ argp,
+ sizeof(struct btrfs_ioctl_heat_info)) != 0) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   file_filp = filp_open(heat_info-filename, O_RDONLY, 0);
+   file_inode = file_filp-f_dentry-d_inode;
+   filp_close(file_filp, NULL);
+
+   hitree = root-hot_inode_tree;
+   read_lock(hitree-lock);
+   he = lookup_hot_inode_item(hitree, file_inode-i_ino);
+   read_unlock(hitree-lock);
+
+   if (!he || IS_ERR(he)) {
+   /* we don't have any info on this file yet */
+   ret = -ENODATA;
+   goto err;
+   }
+
+   spin_lock(he-lock);
+
+   heat_info-avg_delta_reads =
+   (__u64) he-freq_data.avg_delta_reads;
+   heat_info-avg_delta_writes =
+   (__u64) he-freq_data.avg_delta_writes;
+   heat_info-last_read_time =
+   (__u64) timespec_to_ns(he-freq_data.last_read_time);
+   heat_info-last_write_time =
+   (__u64) timespec_to_ns(he-freq_data.last_write_time);
+   heat_info-num_reads =
+   (__u32) he-freq_data.nr_reads;
+   heat_info-num_writes =
+   (__u32) he-freq_data.nr_writes;
+
+   if (heat_info-live  0) {
+   /* got a request for live temperature,
+* call btrfs_get_temp to recalculate */
+   heat_info-temperature = btrfs_get_temp(he-freq_data);
+   } else {
+   /* not live temperature, get it from the hashlist */
+   read_lock(he-heat_node-hlist-rwlock);
+   heat_info-temperature = he-heat_node-hlist-temperature;
+   read_unlock(he-heat_node-hlist-rwlock);
+   }
+
+   spin_unlock(he-lock);
+   free_hot_inode_item(he);
+
+   if (copy_to_user(argp, (void *) heat_info,
+sizeof(struct btrfs_ioctl_heat_info))) {
+   ret = -EFAULT;
+   goto err;
+   }
+
+   kfree(heat_info);
+   return 0;
+
+err:
+   kfree(heat_info);
+   return ret;
+}
+
+static long 

[RFC v2 PATCH 4/6] Btrfs: Add debugfs interface for hot data stats

2010-08-12 Thread bchociej
From: Ben Chociej bchoc...@gmail.com

Add a /sys/kernel/debug/btrfs_data/device_name/ directory for each
volume that contains two files. The first, `inode_data', contains the
heat information for inodes that have been brought into the hot data map
structures. The second, `range_data', contains similar information for
subfile ranges.

Signed-off-by: Matt Lupfer mlup...@gmail.com
Signed-off-by: Conor Scott consc...@vt.edu
Signed-off-by: Ben Chociej bchoc...@gmail.com
Reviewed-by: Mingming Cao c...@us.ibm.com
---
 fs/btrfs/debugfs.c |  532 
 fs/btrfs/debugfs.h |   89 +
 2 files changed, 621 insertions(+), 0 deletions(-)
 create mode 100644 fs/btrfs/debugfs.c
 create mode 100644 fs/btrfs/debugfs.h

diff --git a/fs/btrfs/debugfs.c b/fs/btrfs/debugfs.c
new file mode 100644
index 000..c11c0b6
--- /dev/null
+++ b/fs/btrfs/debugfs.c
@@ -0,0 +1,532 @@
+/*
+ * fs/btrfs/debugfs.c
+ *
+ * This file contains the code to interface with the btrfs debugfs.
+ * The debugfs outputs range- and file-level access frequency
+ * statistics for each mounted volume.
+ *
+ * Copyright (C) 2010 International Business Machines Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include linux/debugfs.h
+#include linux/fs.h
+#include linux/module.h
+#include linux/types.h
+#include linux/vmalloc.h
+#include linux/limits.h
+#include ctree.h
+#include hotdata_map.h
+#include hotdata_hash.h
+#include hotdata_relocate.h
+#include debugfs.h
+
+static int copy_msg_to_log(struct debugfs_vol_data *data, char *msg, int len)
+{
+   struct lstring *debugfs_log = data-debugfs_log;
+   uint new_log_alloc_size;
+   char *new_log;
+
+   if (len = data-log_alloc_size - debugfs_log-len) {
+   /* Not enough room in the log buffer for the new message. */
+   /* Allocate a bigger buffer. */
+   new_log_alloc_size = data-log_alloc_size + LOG_PAGE_SIZE;
+   new_log = vmalloc(new_log_alloc_size);
+
+   if (new_log) {
+   memcpy(new_log, debugfs_log-str,
+   debugfs_log-len);
+   memset(new_log + debugfs_log-len, 0,
+   new_log_alloc_size - debugfs_log-len);
+   vfree(debugfs_log-str);
+   debugfs_log-str = new_log;
+   data-log_alloc_size = new_log_alloc_size;
+   } else {
+   WARN_ON(1);
+   if (data-log_alloc_size - debugfs_log-len) {
+   #define err_msg No more memory!\n
+   strlcpy(debugfs_log-str +
+   debugfs_log-len,
+   err_msg, data-log_alloc_size -
+   debugfs_log-len);
+   debugfs_log-len +=
+   min((typeof(debugfs_log-len))
+   sizeof(err_msg),
+   ((typeof(debugfs_log-len))
+   data-log_alloc_size -
+   debugfs_log-len));
+   }
+   return 0;
+   }
+   }
+
+   memcpy(debugfs_log-str + debugfs_log-len,
+   data-log_work_buff, len);
+   debugfs_log-len += (unsigned long) len;
+
+   return len;
+}
+
+/* Returns the number of bytes written to the log. */
+static int debugfs_log(struct debugfs_vol_data *data, const char *fmt, ...)
+{
+   struct lstring *debugfs_log = data-debugfs_log;
+   va_list args;
+   int len;
+
+   if (debugfs_log-str == NULL)
+   return -1;
+
+   spin_lock(data-log_lock);
+
+   va_start(args, fmt);
+   len = vsnprintf(data-log_work_buff, sizeof(data-log_work_buff), fmt,
+   args);
+   va_end(args);
+
+   if (len = sizeof(data-log_work_buff)) {
+   #define truncate_msg The next message has been truncated.\n
+   copy_msg_to_log(data, truncate_msg, sizeof(truncate_msg));
+   }
+
+   len = copy_msg_to_log(data, data-log_work_buff, len);
+   spin_unlock(data-log_lock);
+
+   return len;
+}
+
+/* initialize a log corresponding to a btrfs 

[RFC v2 PATCH 3/6] Btrfs: Add hot data relocation facilities

2010-08-12 Thread bchociej
From: Ben Chociej bchoc...@gmail.com

The relocation code operates on the heat hash lists to identify hot or
cold data logical file ranges that are candidates for relocation. The
triggering mechanism for relocation is controlled by a global heat
threshold integer value (fs_root-heat_threshold). Ranges are queued for
relocation by the periodically-executing relocate kthread, which updates
the global heat threshold and responds to space pressure on the SSDs.

The heat hash lists index logical ranges by heat and provide a
constant-time access path to hot or cold range items. The relocation
kthread uses this path to find hot or cold items to move to/from SSD. To
ensure that the relocation kthread has a chance to sleep, and to prevent
thrashing between SSD and HDD, there is a configurable limit to how many
ranges are moved per iteration of the kthread. This limit may be overrun
in the case where space pressure requires that items be aggressively
moved from SSD back to HDD.

This needs still more resistance to thrashing and stronger (read:
actual) guarantees that relocation operations won't -ENOSPC.

The relocation code has introduced two new btrfs block group types:
BTRFS_BLOCK_GROUP_DATA_SSD and BTRFS_BLOCK_GROUP_METADATA_SSD. The later
is not currently implemented; to wit, this implementation does not move
any metadata, including inlined extents, to SSD.

When mkfs'ing a volume with the hot data relocation option, initial
block groups are allocated to the proper disks. Runtime block group
allocation only allocates BTRFS_BLOCK_GROUP_DATA
BTRFS_BLOCK_GROUP_METADATA and BTRFS_BLOCK_GROUP_SYSTEM to HDD, and
likewise only allocates BTRFS_BLOCK_GROUP_DATA_SSD and
BTRFS_BLOCK_GROUP_METADATA_SSD to SSD (assuming, critically, the
HOTDATAMOVE option is set at mount time).

Signed-off-by: Ben Chociej bchoc...@gmail.com
Signed-off-by: Matt Lupfer mlup...@gmail.com
Signed-off-by: Conor Scott consc...@vt.edu
Reviewed-by: Mingming Cao c...@us.ibm.com
---
 fs/btrfs/hotdata_relocate.c |  783 +++
 fs/btrfs/hotdata_relocate.h |   73 
 2 files changed, 856 insertions(+), 0 deletions(-)
 create mode 100644 fs/btrfs/hotdata_relocate.c
 create mode 100644 fs/btrfs/hotdata_relocate.h

diff --git a/fs/btrfs/hotdata_relocate.c b/fs/btrfs/hotdata_relocate.c
new file mode 100644
index 000..c5060c4
--- /dev/null
+++ b/fs/btrfs/hotdata_relocate.c
@@ -0,0 +1,783 @@
+/*
+ * fs/btrfs/hotdata_relocate.c
+ *
+ * Copyright (C) 2010 International Business Machines Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include linux/kthread.h
+#include linux/list.h
+#include linux/freezer.h
+#include linux/spinlock.h
+#include linux/bio.h
+#include linux/blkdev.h
+#include linux/slab.h
+#include hotdata_map.h
+#include hotdata_relocate.h
+#include btrfs_inode.h
+#include ctree.h
+#include volumes.h
+
+/*
+ * Hot data relocation strategy:
+ *
+ * The relocation code below operates on the heat hash lists to identify
+ * hot or cold data logical file ranges that are candidates for relocation.
+ * The triggering mechanism for relocation is controlled by a global heat
+ * threshold integer value (fs_root-heat_threshold). Ranges are queued
+ * for relocation by the periodically executing relocate kthread, which
+ * updates the global heat threshold and responds to space pressure on the
+ * SSDs.
+ *
+ * The heat hash lists index logical ranges by heat and provide a constant-time
+ * access path to hot or cold range items. The relocation kthread uses this
+ * path to find hot or cold items to move to/from SSD. To ensure that the
+ * relocation kthread has a chance to sleep, and to prevent thrashing between
+ * SSD and HDD, there is a configurable limit to how many ranges are moved per
+ * iteration of the kthread. This limit may be overrun in the case where space
+ * pressure requires that items be aggressively moved from SSD back to HDD.
+ *
+ * This needs still more resistance to thrashing and stronger (read: actual)
+ * guarantees that relocation operations won't -ENOSPC.
+ *
+ * The relocation code has introduced two new btrfs block group types:
+ * BTRFS_BLOCK_GROUP_DATA_SSD and BTRFS_BLOCK_GROUP_METADATA_SSD. The later is
+ * not currently implemented; to wit, this implementation does not move any
+ * metadata *including inlined extents* to SSD.
+ *
+ * When mkfs'ing a volume with the hot 

[PATCH 0/2] Btrfs-progs: Add support for hot data migration

2010-08-12 Thread bchociej
This patch set introduces functionality into btrfsctl and mkfs.btrfs to
support the kernel patches for hot data tracking and migration to SSD
with Btrfs. New functionality includes a -h option to mkfs.btrfs to
preallocate approrpiate block group types for SSD data migration, and
also includes additional options for btrfsctl to interact with the new
ioctls introduced by the kernel patches.


DIFFSTAT:

 btrfsctl.c|  111 +++-
 ctree.h   |2 +
 extent-tree.c |2 +-
 ioctl-test.c  |3 +
 ioctl.h   |   24 +
 mkfs.c|  131 ---
 utils.c   |1 +
 volumes.c |   73 +-
 volumes.h |3 +-
 9 files changed, 326 insertions(+), 24 deletions(-)


Signed-off-by: Ben Chociej bchoc...@gmail.com
Signed-off-by: Matt Lupfer mlup...@gmail.com
Tested-by: Conor Scott consc...@vt.edu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs-progs: Add hot data support in mkfs

2010-08-12 Thread bchociej
From: Ben Chociej bchoc...@gmail.com

Modified mkfs.btrfs to add hot data relocation option (-h) which
preallocates BTRFS_BLOCK_GROUP_DATA_SSD and
BTRFS_BLOCK_GROUP_METADATA_SSD at mkfs time for future use by hot data
relocation code.  Also added a userspace function to detect whether a
block device is an SSD by reading the sysfs block queue rotational flag.

Signed-off-by: Ben Chociej bchoc...@gmail.com
Signed-off-by: Matt Lupfer mlup...@gmail.com
Tested-by: Conor Scott consc...@vt.edu
---
 ctree.h   |2 +
 extent-tree.c |2 +-
 mkfs.c|  131 +
 utils.c   |1 +
 volumes.c |   73 +++-
 volumes.h |3 +-
 6 files changed, 190 insertions(+), 22 deletions(-)

diff --git a/ctree.h b/ctree.h
index 64ecf12..8c29122 100644
--- a/ctree.h
+++ b/ctree.h
@@ -640,6 +640,8 @@ struct btrfs_csum_item {
 #define BTRFS_BLOCK_GROUP_RAID1(1  4)
 #define BTRFS_BLOCK_GROUP_DUP (1  5)
 #define BTRFS_BLOCK_GROUP_RAID10   (1  6)
+#define BTRFS_BLOCK_GROUP_DATA_SSD (1  7)
+#define BTRFS_BLOCK_GROUP_METADATA_SSD (1  8)
 
 struct btrfs_block_group_item {
__le64 used;
diff --git a/extent-tree.c b/extent-tree.c
index b2f9bb2..a6b2beb 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1812,7 +1812,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
thresh)
return 0;
 
-   ret = btrfs_alloc_chunk(trans, extent_root, start, num_bytes, flags);
+   ret = btrfs_alloc_chunk(trans, extent_root, start, num_bytes, flags, 
0);
if (ret == -ENOSPC) {
space_info-full = 1;
return 0;
diff --git a/mkfs.c b/mkfs.c
index 2e99b95..f45cfc3 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -69,7 +69,61 @@ static u64 parse_size(char *s)
return atol(s) * mult;
 }
 
-static int make_root_dir(struct btrfs_root *root)
+static int make_root_dir2(struct btrfs_root *root, int hotdata)
+{
+   struct btrfs_trans_handle *trans;
+   u64 chunk_start = 0;
+   u64 chunk_size = 0;
+   int ret;
+
+   trans = btrfs_start_transaction(root, 1);
+
+   /*
+* If hotdata option is set, preallocate a metadata SSD block group
+* (not currently used)
+*/
+   if (hotdata) {
+   ret = btrfs_alloc_chunk(trans, root-fs_info-extent_root,
+   chunk_start, chunk_size,
+   BTRFS_BLOCK_GROUP_METADATA_SSD, hotdata);
+   BUG_ON(ret);
+   ret = btrfs_make_block_group(trans, root, 0,
+BTRFS_BLOCK_GROUP_METADATA_SSD,
+BTRFS_FIRST_CHUNK_TREE_OBJECTID,
+chunk_start, chunk_size);
+   BUG_ON(ret);
+   }
+
+   ret = btrfs_alloc_chunk(trans, root-fs_info-extent_root,
+   chunk_start, chunk_size,
+   BTRFS_BLOCK_GROUP_DATA, hotdata);
+   BUG_ON(ret);
+   ret = btrfs_make_block_group(trans, root, 0,
+BTRFS_BLOCK_GROUP_DATA,
+BTRFS_FIRST_CHUNK_TREE_OBJECTID,
+chunk_start, chunk_size);
+   BUG_ON(ret);
+
+   /*
+* If hotdata option is set, preallocate a data SSD block group
+*/
+   if (hotdata) {
+   ret = btrfs_alloc_chunk(trans, root-fs_info-extent_root,
+   chunk_start, chunk_size,
+   BTRFS_BLOCK_GROUP_DATA_SSD, hotdata);
+   BUG_ON(ret);
+   ret = btrfs_make_block_group(trans, root, 0,
+BTRFS_BLOCK_GROUP_DATA_SSD,
+BTRFS_FIRST_CHUNK_TREE_OBJECTID,
+chunk_start, chunk_size);
+   BUG_ON(ret);
+   }
+
+   btrfs_commit_transaction(trans, root);
+   return ret;
+}
+
+static int make_root_dir(struct btrfs_root *root, int hotdata)
 {
struct btrfs_trans_handle *trans;
struct btrfs_key location;
@@ -90,7 +144,7 @@ static int make_root_dir(struct btrfs_root *root)
 
ret = btrfs_alloc_chunk(trans, root-fs_info-extent_root,
chunk_start, chunk_size,
-   BTRFS_BLOCK_GROUP_METADATA);
+   BTRFS_BLOCK_GROUP_METADATA, hotdata);
BUG_ON(ret);
ret = btrfs_make_block_group(trans, root, 0,
 BTRFS_BLOCK_GROUP_METADATA,
@@ -103,16 +157,6 @@ static int make_root_dir(struct btrfs_root *root)
trans = btrfs_start_transaction(root, 1);
BUG_ON(!trans);
 
-   ret = btrfs_alloc_chunk(trans, root-fs_info-extent_root,
-   chunk_start, chunk_size,
-   BTRFS_BLOCK_GROUP_DATA);
-   BUG_ON(ret);
-   

[PATCH] btrfs: avoid duplications by moving the static int array from header to c file

2010-08-12 Thread Cheng Renquan
The commit 607d432d referred a static int array defined in ctree.h,
and a static inline function (btrfs_super_csum_size) using this array,
the obvious problem is every c file using that function would have a
local copy of that int array, multiple c files calling would result
multiple copies of that array:

$ nm fs/btrfs/btrfs.ko | grep btrfs_csum_sizes
010c r btrfs_csum_sizes
0114 r btrfs_csum_sizes
01c0 r btrfs_csum_sizes
05a0 r btrfs_csum_sizes

the original commit has 4 c files called this static inline function,
till now there are still those 4 c files calling it, so there are 4 copies
of btrfs_csum_sizes; but future code may call it in more c files,
resulting in more copies;

 fs/btrfs/ctree.h |   19 -
 fs/btrfs/disk-io.c   |   25 +
 fs/btrfs/file-item.c |   56 -
 fs/btrfs/ioctl.c |9 ---
 fs/btrfs/tree-log.c  |   10 +---
 5 files changed, 81 insertions(+), 38 deletions(-)

multiple copies are just wasting memory; move it to a c file can avoid
duplications; and since the inline function referred ARRAY_SIZE of that
array, must know the array size at compile time, so cannot be inlined
anyway.

The cost is originally inlined function calling changed to external function
calling.

Signed-off-by: Cheng Renquan crq...@gmail.com
---
 fs/btrfs/ctree.c |9 +
 fs/btrfs/ctree.h |9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index c3df14c..3a89207 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -24,6 +24,15 @@
 #include print-tree.h
 #include locking.h
 
+int btrfs_super_csum_size(struct btrfs_super_block *s)
+{
+   static const int btrfs_csum_sizes[] = { 4, 0 };
+
+   int t = btrfs_super_csum_type(s);
+   BUG_ON(t = ARRAY_SIZE(btrfs_csum_sizes));
+   return btrfs_csum_sizes[t];
+}
+
 static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
  *root, struct btrfs_path *path, int level);
 static int split_leaf(struct btrfs_trans_handle *trans, struct btrfs_root
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e9bf864..99220ee 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -132,8 +132,6 @@ struct btrfs_ordered_sum;
 /* csum types */
 #define BTRFS_CSUM_TYPE_CRC32  0
 
-static int btrfs_csum_sizes[] = { 4, 0 };
-
 /* four bytes for CRC32 */
 #define BTRFS_EMPTY_DIR_SIZE 0
 
@@ -1877,12 +1875,7 @@ BTRFS_SETGET_STACK_FUNCS(super_incompat_flags, struct 
btrfs_super_block,
 BTRFS_SETGET_STACK_FUNCS(super_csum_type, struct btrfs_super_block,
 csum_type, 16);
 
-static inline int btrfs_super_csum_size(struct btrfs_super_block *s)
-{
-   int t = btrfs_super_csum_type(s);
-   BUG_ON(t = ARRAY_SIZE(btrfs_csum_sizes));
-   return btrfs_csum_sizes[t];
-}
+int btrfs_super_csum_size(struct btrfs_super_block *s);
 
 static inline unsigned long btrfs_leaf_data(struct extent_buffer *l)
 {
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html