Re: [RFC v2 02/10] vfs: add support for updating access frequency

2012-09-26 Thread Zhi Yong Wu
On Thu, Sep 27, 2012 at 10:19 AM, Dave Chinner  wrote:
> On Wed, Sep 26, 2012 at 10:53:07AM +0800, Zhi Yong Wu wrote:
>> On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner  wrote:
>> > On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote:
>> > I note that the code will always insert range items of a length
>> > RANGE_SIZE. This means you have a fixed object granularity and hence
>> > you have no need for a range based search. That is, you could use a
>> > radix tree where each entry in the radix tree points directly to the
>> > range object similar to how the page cache uses a radix tree for
>> > indexing pages. That brings the possibility of lockless range item
>> > lookups
>> Great suggestion, but can we temporarily put it in TODO list? because
>> it will bring one big code change.
>
> Sure. I just wanted to point out that there are better choices for
> indexing fixed size elements than rb-trees and why it might make
> sense to use a different type of tree.
Got it, thanks. Moreover, it should also be better to use radix tree
to hold hot_inode, not only hot_range.

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 02/10] vfs: add support for updating access frequency

2012-09-26 Thread Dave Chinner
On Wed, Sep 26, 2012 at 10:53:07AM +0800, Zhi Yong Wu wrote:
> On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner  wrote:
> > On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote:
> > I note that the code will always insert range items of a length
> > RANGE_SIZE. This means you have a fixed object granularity and hence
> > you have no need for a range based search. That is, you could use a
> > radix tree where each entry in the radix tree points directly to the
> > range object similar to how the page cache uses a radix tree for
> > indexing pages. That brings the possibility of lockless range item
> > lookups
> Great suggestion, but can we temporarily put it in TODO list? because
> it will bring one big code change.

Sure. I just wanted to point out that there are better choices for
indexing fixed size elements than rb-trees and why it might make
sense to use a different type of tree.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 02/10] vfs: add support for updating access frequency

2012-09-25 Thread Zhi Yong Wu
thanks a lot for your review in my heart, Dave. It is very helpful to me.

On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner  wrote:
> On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote:
>> From: Zhi Yong Wu 
>>
>>   Add some utils helpers to update access frequencies
>> for one file or its range.
>>
>> Signed-off-by: Zhi Yong Wu 
>> ---
>>  fs/hot_tracking.c |  359 
>> +
>>  fs/hot_tracking.h |   15 +++
>>  2 files changed, 374 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
>> index 173054b..52ed926 100644
>> --- a/fs/hot_tracking.c
>> +++ b/fs/hot_tracking.c
>> @@ -106,6 +106,365 @@ inode_err:
>>  }
>>
>>  /*
>> + * Drops the reference out on hot_inode_item by one and free the structure
>> + * if the reference count hits zero
>> + */
>> +void hot_rb_free_hot_inode_item(struct hot_inode_item *he)
>
> hot_inode_item_put()
>
>> +{
>> + if (!he)
>> + return;
>
> It's a bug to call a put function on a kref counted item with a null
> pointer. Let the kernel crash so it is noticed and fixed.
OK, will remove it.
>
>> +
>> + if (atomic_dec_and_test(&he->refs.refcount)) {
>> + WARN_ON(he->in_tree);
>> + kmem_cache_free(hot_inode_item_cache, he);
>> + }
>
> Isn't this abusing krefs? i.e. this should be:
Sorry, thanks for your explaination as below:
>
> hot_inode_item_free()
> {
> WARN_ON(he->in_tree);
> kmem_cache_free(hot_inode_item_cache, he);
> }
>
> hot_inode_item_put()
> {
> kref_put(&he->refs, hot_inode_item_free)
> }
>
>> +/*
>> + * Drops the reference out on hot_range_item by one and free the structure
>> + * if the reference count hits zero
>> + */
>> +static void hot_rb_free_hot_range_item(struct hot_range_item *hr)
>
> same comments as above.
OK, thanks.
> 
>> +static struct rb_node *hot_rb_insert_hot_inode_item(struct rb_root *root,
>> + unsigned long inode_num,
>> + struct rb_node *node)
>
> static struct rb_node *
> hot_inode_item_find(. )
OK, thanks.
>
>> +{
>> + struct rb_node **p = &root->rb_node;
>> + struct rb_node *parent = NULL;
>> + struct hot_inode_item *entry;
>> +
>> + /* walk tree to find insertion point */
>> + while (*p) {
>> + parent = *p;
>> + entry = rb_entry(parent, struct hot_inode_item, rb_node);
>> +
>> + if (inode_num < entry->i_ino)
>> + p = &(*p)->rb_left;
>> + else if (inode_num > entry->i_ino)
>> + p = &(*p)->rb_right;
>> + else
>> + return parent;
>> + }
>> +
>> + entry = rb_entry(node, struct hot_inode_item, rb_node);
>> + entry->in_tree = 1;
>> + rb_link_node(node, parent, p);
>> + rb_insert_color(node, root);
>
> So the hot inode item search key is the inode number? Why use an
Yes
> rb-tree then? Wouldn't a btree be a more memory efficient way to
> hold a sparse index that has fixed key and pointer sizes?
Yes, i know, but if we don't use btree, what will be better? Radix tree?

>
> Also, the API seems quite strange. you pass in the the rb_node and
> the inode number which instead of passing in the hot inode item that
> already holds this information. You then convert the rb_node back to
> a hot inode item to set the in_tree variable. So why not just pass
> in the struct hot_inode_item in the first place?
Good catch, thanks for your remider.
>
>> +static u64 hot_rb_range_end(struct hot_range_item *hr)
>
> hot_range_item_end()
OK
>
>> +{
>> + if (hr->start + hr->len < hr->start)
>> + return (u64)-1;
>> +
>> + return hr->start + hr->len - 1;
>> +}
>> +
>> +static struct rb_node *hot_rb_insert_hot_range_item(struct rb_root *root,
>
> hot_range_item_find()
OK
>
>> + u64 start,
>> + struct rb_node *node)
>> +{
>> + struct rb_node **p = &root->rb_node;
>> + struct rb_node *parent = NULL;
>> + struct hot_range_item *entry;
>> +
>> + /* ensure start is on a range boundary */
>> + start = start & RANGE_SIZE_MASK;
>> + /* walk tree to find insertion point */
>> + while (*p) {
>> + parent = *p;
>> + entry = rb_entry(parent, struct hot_range_item, rb_node);
>> +
>> + if (start < entry->start)
>> + p = &(*p)->rb_left;
>> + else if (start >= hot_rb_range_end(entry))
>
> Shouldn't an aligned end always be one byte short of the start
> offset of the next aligned region? i.e. start ==
> hot_rb_range_end(entry) is an indication of an off-by one bug
> somewhere?
This is really one good catch, thanks.
>
>> + p = &(*p)->rb_right;
>> + else
>> + return parent;
>> + }
>> +
>> + entry = rb_entry(node, struct hot

Re: [RFC v2 02/10] vfs: add support for updating access frequency

2012-09-25 Thread Dave Chinner
On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote:
> From: Zhi Yong Wu 
> 
>   Add some utils helpers to update access frequencies
> for one file or its range.
> 
> Signed-off-by: Zhi Yong Wu 
> ---
>  fs/hot_tracking.c |  359 
> +
>  fs/hot_tracking.h |   15 +++
>  2 files changed, 374 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
> index 173054b..52ed926 100644
> --- a/fs/hot_tracking.c
> +++ b/fs/hot_tracking.c
> @@ -106,6 +106,365 @@ inode_err:
>  }
>  
>  /*
> + * Drops the reference out on hot_inode_item by one and free the structure
> + * if the reference count hits zero
> + */
> +void hot_rb_free_hot_inode_item(struct hot_inode_item *he)

hot_inode_item_put()

> +{
> + if (!he)
> + return;

It's a bug to call a put function on a kref counted item with a null
pointer. Let the kernel crash so it is noticed and fixed.

> +
> + if (atomic_dec_and_test(&he->refs.refcount)) {
> + WARN_ON(he->in_tree);
> + kmem_cache_free(hot_inode_item_cache, he);
> + }

Isn't this abusing krefs? i.e. this should be:

hot_inode_item_free()
{
WARN_ON(he->in_tree);
kmem_cache_free(hot_inode_item_cache, he);
}

hot_inode_item_put()
{
kref_put(&he->refs, hot_inode_item_free)
}

> +/*
> + * Drops the reference out on hot_range_item by one and free the structure
> + * if the reference count hits zero
> + */
> +static void hot_rb_free_hot_range_item(struct hot_range_item *hr)

same comments as above.

> +static struct rb_node *hot_rb_insert_hot_inode_item(struct rb_root *root,
> + unsigned long inode_num,
> + struct rb_node *node)

static struct rb_node *
hot_inode_item_find(. )

> +{
> + struct rb_node **p = &root->rb_node;
> + struct rb_node *parent = NULL;
> + struct hot_inode_item *entry;
> +
> + /* walk tree to find insertion point */
> + while (*p) {
> + parent = *p;
> + entry = rb_entry(parent, struct hot_inode_item, rb_node);
> +
> + if (inode_num < entry->i_ino)
> + p = &(*p)->rb_left;
> + else if (inode_num > entry->i_ino)
> + p = &(*p)->rb_right;
> + else
> + return parent;
> + }
> +
> + entry = rb_entry(node, struct hot_inode_item, rb_node);
> + entry->in_tree = 1;
> + rb_link_node(node, parent, p);
> + rb_insert_color(node, root);

So the hot inode item search key is the inode number? Why use an
rb-tree then? Wouldn't a btree be a more memory efficient way to
hold a sparse index that has fixed key and pointer sizes?

Also, the API seems quite strange. you pass in the the rb_node and
the inode number which instead of passing in the hot inode item that
already holds this information. You then convert the rb_node back to
a hot inode item to set the in_tree variable. So why not just pass
in the struct hot_inode_item in the first place?

> +static u64 hot_rb_range_end(struct hot_range_item *hr)

hot_range_item_end()

> +{
> + if (hr->start + hr->len < hr->start)
> + return (u64)-1;
> +
> + return hr->start + hr->len - 1;
> +}
> +
> +static struct rb_node *hot_rb_insert_hot_range_item(struct rb_root *root,

hot_range_item_find()

> + u64 start,
> + struct rb_node *node)
> +{
> + struct rb_node **p = &root->rb_node;
> + struct rb_node *parent = NULL;
> + struct hot_range_item *entry;
> +
> + /* ensure start is on a range boundary */
> + start = start & RANGE_SIZE_MASK;
> + /* walk tree to find insertion point */
> + while (*p) {
> + parent = *p;
> + entry = rb_entry(parent, struct hot_range_item, rb_node);
> +
> + if (start < entry->start)
> + p = &(*p)->rb_left;
> + else if (start >= hot_rb_range_end(entry))

Shouldn't an aligned end always be one byte short of the start
offset of the next aligned region? i.e. start ==
hot_rb_range_end(entry) is an indication of an off-by one bug
somewhere?

> + p = &(*p)->rb_right;
> + else
> + return parent;
> + }
> +
> + entry = rb_entry(node, struct hot_range_item, rb_node);
> + entry->in_tree = 1;
> + rb_link_node(node, parent, p);
> + rb_insert_color(node, root);

Same comments as the hot_inode_range.

Also, the start offset in the hot_range_item is already aligned, so
why do you need to align it again? 

> +
> + return NULL;
> +}
> +
> +/*
> + * Add a hot_inode_item to a hot_inode_tree. If the tree already contains
> + * an item with the index given, return -EEXIST
> + */
> +static int hot_rb_add_hot_inode_item(struct hot_inode_tree *tree,
> + stru

[RFC v2 02/10] vfs: add support for updating access frequency

2012-09-23 Thread zwu . kernel
From: Zhi Yong Wu 

  Add some utils helpers to update access frequencies
for one file or its range.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c |  359 +
 fs/hot_tracking.h |   15 +++
 2 files changed, 374 insertions(+), 0 deletions(-)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 173054b..52ed926 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -106,6 +106,365 @@ inode_err:
 }
 
 /*
+ * Drops the reference out on hot_inode_item by one and free the structure
+ * if the reference count hits zero
+ */
+void hot_rb_free_hot_inode_item(struct hot_inode_item *he)
+{
+   if (!he)
+   return;
+
+   if (atomic_dec_and_test(&he->refs.refcount)) {
+   WARN_ON(he->in_tree);
+   kmem_cache_free(hot_inode_item_cache, he);
+   }
+}
+
+/*
+ * Drops the reference out on hot_range_item by one and free the structure
+ * if the reference count hits zero
+ */
+static void hot_rb_free_hot_range_item(struct hot_range_item *hr)
+{
+   if (!hr)
+   return;
+
+   if (atomic_dec_and_test(&hr->refs.refcount)) {
+   WARN_ON(hr->in_tree);
+   kmem_cache_free(hot_range_item_cache, hr);
+   }
+}
+
+static struct rb_node *hot_rb_insert_hot_inode_item(struct rb_root *root,
+   unsigned long inode_num,
+   struct rb_node *node)
+{
+   struct rb_node **p = &root->rb_node;
+   struct rb_node *parent = NULL;
+   struct hot_inode_item *entry;
+
+   /* walk tree to find insertion point */
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct hot_inode_item, rb_node);
+
+   if (inode_num < entry->i_ino)
+   p = &(*p)->rb_left;
+   else if (inode_num > entry->i_ino)
+   p = &(*p)->rb_right;
+   else
+   return parent;
+   }
+
+   entry = rb_entry(node, struct hot_inode_item, rb_node);
+   entry->in_tree = 1;
+   rb_link_node(node, parent, p);
+   rb_insert_color(node, root);
+
+   return NULL;
+}
+
+static u64 hot_rb_range_end(struct hot_range_item *hr)
+{
+   if (hr->start + hr->len < hr->start)
+   return (u64)-1;
+
+   return hr->start + hr->len - 1;
+}
+
+static struct rb_node *hot_rb_insert_hot_range_item(struct rb_root *root,
+   u64 start,
+   struct rb_node *node)
+{
+   struct rb_node **p = &root->rb_node;
+   struct rb_node *parent = NULL;
+   struct hot_range_item *entry;
+
+   /* ensure start is on a range boundary */
+   start = start & RANGE_SIZE_MASK;
+   /* walk tree to find insertion point */
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct hot_range_item, rb_node);
+
+   if (start < entry->start)
+   p = &(*p)->rb_left;
+   else if (start >= hot_rb_range_end(entry))
+   p = &(*p)->rb_right;
+   else
+   return parent;
+   }
+
+   entry = rb_entry(node, struct hot_range_item, rb_node);
+   entry->in_tree = 1;
+   rb_link_node(node, parent, p);
+   rb_insert_color(node, root);
+
+   return NULL;
+}
+
+/*
+ * Add a hot_inode_item to a hot_inode_tree. If the tree already contains
+ * an item with the index given, return -EEXIST
+ */
+static int hot_rb_add_hot_inode_item(struct hot_inode_tree *tree,
+   struct hot_inode_item *he)
+{
+   int ret = 0;
+   struct rb_node *rb;
+
+   rb = hot_rb_insert_hot_inode_item(
+   &tree->map, he->i_ino, &he->rb_node);
+   if (rb) {
+   ret = -EEXIST;
+   goto out;
+   }
+
+   kref_get(&he->refs);
+
+out:
+   return ret;
+}
+
+/*
+ * Add a hot_range_item to a hot_range_tree. If the tree already contains
+ * an item with the index given, return -EEXIST
+ *
+ * Also optionally aggresively merge ranges (currently disabled)
+ */
+static int hot_rb_add_hot_range_item(struct hot_range_tree *tree,
+   struct hot_range_item *hr)
+{
+   int ret = 0;
+   struct rb_node *rb;
+
+   rb = hot_rb_insert_hot_range_item(
+   &tree->map, hr->start, &hr->rb_node);
+   if (rb) {
+   ret = -EEXIST;
+   goto out;
+   }
+
+   kref_get(&hr->refs);
+
+out:
+   return ret;
+}
+
+/*
+ * Lookup a hot_inode_item in the hot_inode_tree with the given index
+ * (inode_num)
+ */
+struct hot_inode_item
+*hot_rb_lookup_hot_inode_item(struct hot_inode_tree *tree,
+   unsigned long inode_num)
+{
+   struct rb_node **p = &(tree->map.rb_node);
+   struct rb_node *parent = N