Re: [RFC v2 02/10] vfs: add support for updating access frequency
On Thu, Sep 27, 2012 at 10:19 AM, Dave Chinner wrote: > On Wed, Sep 26, 2012 at 10:53:07AM +0800, Zhi Yong Wu wrote: >> On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner wrote: >> > On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote: >> > I note that the code will always insert range items of a length >> > RANGE_SIZE. This means you have a fixed object granularity and hence >> > you have no need for a range based search. That is, you could use a >> > radix tree where each entry in the radix tree points directly to the >> > range object similar to how the page cache uses a radix tree for >> > indexing pages. That brings the possibility of lockless range item >> > lookups >> Great suggestion, but can we temporarily put it in TODO list? because >> it will bring one big code change. > > Sure. I just wanted to point out that there are better choices for > indexing fixed size elements than rb-trees and why it might make > sense to use a different type of tree. Got it, thanks. Moreover, it should also be better to use radix tree to hold hot_inode, not only hot_range. > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v2 02/10] vfs: add support for updating access frequency
On Wed, Sep 26, 2012 at 10:53:07AM +0800, Zhi Yong Wu wrote: > On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner wrote: > > On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote: > > I note that the code will always insert range items of a length > > RANGE_SIZE. This means you have a fixed object granularity and hence > > you have no need for a range based search. That is, you could use a > > radix tree where each entry in the radix tree points directly to the > > range object similar to how the page cache uses a radix tree for > > indexing pages. That brings the possibility of lockless range item > > lookups > Great suggestion, but can we temporarily put it in TODO list? because > it will bring one big code change. Sure. I just wanted to point out that there are better choices for indexing fixed size elements than rb-trees and why it might make sense to use a different type of tree. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v2 02/10] vfs: add support for updating access frequency
thanks a lot for your review in my heart, Dave. It is very helpful to me. On Tue, Sep 25, 2012 at 5:17 PM, Dave Chinner wrote: > On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote: >> From: Zhi Yong Wu >> >> Add some utils helpers to update access frequencies >> for one file or its range. >> >> Signed-off-by: Zhi Yong Wu >> --- >> fs/hot_tracking.c | 359 >> + >> fs/hot_tracking.h | 15 +++ >> 2 files changed, 374 insertions(+), 0 deletions(-) >> >> diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c >> index 173054b..52ed926 100644 >> --- a/fs/hot_tracking.c >> +++ b/fs/hot_tracking.c >> @@ -106,6 +106,365 @@ inode_err: >> } >> >> /* >> + * Drops the reference out on hot_inode_item by one and free the structure >> + * if the reference count hits zero >> + */ >> +void hot_rb_free_hot_inode_item(struct hot_inode_item *he) > > hot_inode_item_put() > >> +{ >> + if (!he) >> + return; > > It's a bug to call a put function on a kref counted item with a null > pointer. Let the kernel crash so it is noticed and fixed. OK, will remove it. > >> + >> + if (atomic_dec_and_test(&he->refs.refcount)) { >> + WARN_ON(he->in_tree); >> + kmem_cache_free(hot_inode_item_cache, he); >> + } > > Isn't this abusing krefs? i.e. this should be: Sorry, thanks for your explaination as below: > > hot_inode_item_free() > { > WARN_ON(he->in_tree); > kmem_cache_free(hot_inode_item_cache, he); > } > > hot_inode_item_put() > { > kref_put(&he->refs, hot_inode_item_free) > } > >> +/* >> + * Drops the reference out on hot_range_item by one and free the structure >> + * if the reference count hits zero >> + */ >> +static void hot_rb_free_hot_range_item(struct hot_range_item *hr) > > same comments as above. OK, thanks. > >> +static struct rb_node *hot_rb_insert_hot_inode_item(struct rb_root *root, >> + unsigned long inode_num, >> + struct rb_node *node) > > static struct rb_node * > hot_inode_item_find(. ) OK, thanks. > >> +{ >> + struct rb_node **p = &root->rb_node; >> + struct rb_node *parent = NULL; >> + struct hot_inode_item *entry; >> + >> + /* walk tree to find insertion point */ >> + while (*p) { >> + parent = *p; >> + entry = rb_entry(parent, struct hot_inode_item, rb_node); >> + >> + if (inode_num < entry->i_ino) >> + p = &(*p)->rb_left; >> + else if (inode_num > entry->i_ino) >> + p = &(*p)->rb_right; >> + else >> + return parent; >> + } >> + >> + entry = rb_entry(node, struct hot_inode_item, rb_node); >> + entry->in_tree = 1; >> + rb_link_node(node, parent, p); >> + rb_insert_color(node, root); > > So the hot inode item search key is the inode number? Why use an Yes > rb-tree then? Wouldn't a btree be a more memory efficient way to > hold a sparse index that has fixed key and pointer sizes? Yes, i know, but if we don't use btree, what will be better? Radix tree? > > Also, the API seems quite strange. you pass in the the rb_node and > the inode number which instead of passing in the hot inode item that > already holds this information. You then convert the rb_node back to > a hot inode item to set the in_tree variable. So why not just pass > in the struct hot_inode_item in the first place? Good catch, thanks for your remider. > >> +static u64 hot_rb_range_end(struct hot_range_item *hr) > > hot_range_item_end() OK > >> +{ >> + if (hr->start + hr->len < hr->start) >> + return (u64)-1; >> + >> + return hr->start + hr->len - 1; >> +} >> + >> +static struct rb_node *hot_rb_insert_hot_range_item(struct rb_root *root, > > hot_range_item_find() OK > >> + u64 start, >> + struct rb_node *node) >> +{ >> + struct rb_node **p = &root->rb_node; >> + struct rb_node *parent = NULL; >> + struct hot_range_item *entry; >> + >> + /* ensure start is on a range boundary */ >> + start = start & RANGE_SIZE_MASK; >> + /* walk tree to find insertion point */ >> + while (*p) { >> + parent = *p; >> + entry = rb_entry(parent, struct hot_range_item, rb_node); >> + >> + if (start < entry->start) >> + p = &(*p)->rb_left; >> + else if (start >= hot_rb_range_end(entry)) > > Shouldn't an aligned end always be one byte short of the start > offset of the next aligned region? i.e. start == > hot_rb_range_end(entry) is an indication of an off-by one bug > somewhere? This is really one good catch, thanks. > >> + p = &(*p)->rb_right; >> + else >> + return parent; >> + } >> + >> + entry = rb_entry(node, struct hot
Re: [RFC v2 02/10] vfs: add support for updating access frequency
On Sun, Sep 23, 2012 at 08:56:27PM +0800, zwu.ker...@gmail.com wrote: > From: Zhi Yong Wu > > Add some utils helpers to update access frequencies > for one file or its range. > > Signed-off-by: Zhi Yong Wu > --- > fs/hot_tracking.c | 359 > + > fs/hot_tracking.h | 15 +++ > 2 files changed, 374 insertions(+), 0 deletions(-) > > diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c > index 173054b..52ed926 100644 > --- a/fs/hot_tracking.c > +++ b/fs/hot_tracking.c > @@ -106,6 +106,365 @@ inode_err: > } > > /* > + * Drops the reference out on hot_inode_item by one and free the structure > + * if the reference count hits zero > + */ > +void hot_rb_free_hot_inode_item(struct hot_inode_item *he) hot_inode_item_put() > +{ > + if (!he) > + return; It's a bug to call a put function on a kref counted item with a null pointer. Let the kernel crash so it is noticed and fixed. > + > + if (atomic_dec_and_test(&he->refs.refcount)) { > + WARN_ON(he->in_tree); > + kmem_cache_free(hot_inode_item_cache, he); > + } Isn't this abusing krefs? i.e. this should be: hot_inode_item_free() { WARN_ON(he->in_tree); kmem_cache_free(hot_inode_item_cache, he); } hot_inode_item_put() { kref_put(&he->refs, hot_inode_item_free) } > +/* > + * Drops the reference out on hot_range_item by one and free the structure > + * if the reference count hits zero > + */ > +static void hot_rb_free_hot_range_item(struct hot_range_item *hr) same comments as above. > +static struct rb_node *hot_rb_insert_hot_inode_item(struct rb_root *root, > + unsigned long inode_num, > + struct rb_node *node) static struct rb_node * hot_inode_item_find(. ) > +{ > + struct rb_node **p = &root->rb_node; > + struct rb_node *parent = NULL; > + struct hot_inode_item *entry; > + > + /* walk tree to find insertion point */ > + while (*p) { > + parent = *p; > + entry = rb_entry(parent, struct hot_inode_item, rb_node); > + > + if (inode_num < entry->i_ino) > + p = &(*p)->rb_left; > + else if (inode_num > entry->i_ino) > + p = &(*p)->rb_right; > + else > + return parent; > + } > + > + entry = rb_entry(node, struct hot_inode_item, rb_node); > + entry->in_tree = 1; > + rb_link_node(node, parent, p); > + rb_insert_color(node, root); So the hot inode item search key is the inode number? Why use an rb-tree then? Wouldn't a btree be a more memory efficient way to hold a sparse index that has fixed key and pointer sizes? Also, the API seems quite strange. you pass in the the rb_node and the inode number which instead of passing in the hot inode item that already holds this information. You then convert the rb_node back to a hot inode item to set the in_tree variable. So why not just pass in the struct hot_inode_item in the first place? > +static u64 hot_rb_range_end(struct hot_range_item *hr) hot_range_item_end() > +{ > + if (hr->start + hr->len < hr->start) > + return (u64)-1; > + > + return hr->start + hr->len - 1; > +} > + > +static struct rb_node *hot_rb_insert_hot_range_item(struct rb_root *root, hot_range_item_find() > + u64 start, > + struct rb_node *node) > +{ > + struct rb_node **p = &root->rb_node; > + struct rb_node *parent = NULL; > + struct hot_range_item *entry; > + > + /* ensure start is on a range boundary */ > + start = start & RANGE_SIZE_MASK; > + /* walk tree to find insertion point */ > + while (*p) { > + parent = *p; > + entry = rb_entry(parent, struct hot_range_item, rb_node); > + > + if (start < entry->start) > + p = &(*p)->rb_left; > + else if (start >= hot_rb_range_end(entry)) Shouldn't an aligned end always be one byte short of the start offset of the next aligned region? i.e. start == hot_rb_range_end(entry) is an indication of an off-by one bug somewhere? > + p = &(*p)->rb_right; > + else > + return parent; > + } > + > + entry = rb_entry(node, struct hot_range_item, rb_node); > + entry->in_tree = 1; > + rb_link_node(node, parent, p); > + rb_insert_color(node, root); Same comments as the hot_inode_range. Also, the start offset in the hot_range_item is already aligned, so why do you need to align it again? > + > + return NULL; > +} > + > +/* > + * Add a hot_inode_item to a hot_inode_tree. If the tree already contains > + * an item with the index given, return -EEXIST > + */ > +static int hot_rb_add_hot_inode_item(struct hot_inode_tree *tree, > + stru
[RFC v2 02/10] vfs: add support for updating access frequency
From: Zhi Yong Wu Add some utils helpers to update access frequencies for one file or its range. Signed-off-by: Zhi Yong Wu --- fs/hot_tracking.c | 359 + fs/hot_tracking.h | 15 +++ 2 files changed, 374 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 173054b..52ed926 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -106,6 +106,365 @@ inode_err: } /* + * Drops the reference out on hot_inode_item by one and free the structure + * if the reference count hits zero + */ +void hot_rb_free_hot_inode_item(struct hot_inode_item *he) +{ + if (!he) + return; + + if (atomic_dec_and_test(&he->refs.refcount)) { + WARN_ON(he->in_tree); + kmem_cache_free(hot_inode_item_cache, he); + } +} + +/* + * Drops the reference out on hot_range_item by one and free the structure + * if the reference count hits zero + */ +static void hot_rb_free_hot_range_item(struct hot_range_item *hr) +{ + if (!hr) + return; + + if (atomic_dec_and_test(&hr->refs.refcount)) { + WARN_ON(hr->in_tree); + kmem_cache_free(hot_range_item_cache, hr); + } +} + +static struct rb_node *hot_rb_insert_hot_inode_item(struct rb_root *root, + unsigned long inode_num, + struct rb_node *node) +{ + struct rb_node **p = &root->rb_node; + struct rb_node *parent = NULL; + struct hot_inode_item *entry; + + /* walk tree to find insertion point */ + while (*p) { + parent = *p; + entry = rb_entry(parent, struct hot_inode_item, rb_node); + + if (inode_num < entry->i_ino) + p = &(*p)->rb_left; + else if (inode_num > entry->i_ino) + p = &(*p)->rb_right; + else + return parent; + } + + entry = rb_entry(node, struct hot_inode_item, rb_node); + entry->in_tree = 1; + rb_link_node(node, parent, p); + rb_insert_color(node, root); + + return NULL; +} + +static u64 hot_rb_range_end(struct hot_range_item *hr) +{ + if (hr->start + hr->len < hr->start) + return (u64)-1; + + return hr->start + hr->len - 1; +} + +static struct rb_node *hot_rb_insert_hot_range_item(struct rb_root *root, + u64 start, + struct rb_node *node) +{ + struct rb_node **p = &root->rb_node; + struct rb_node *parent = NULL; + struct hot_range_item *entry; + + /* ensure start is on a range boundary */ + start = start & RANGE_SIZE_MASK; + /* walk tree to find insertion point */ + while (*p) { + parent = *p; + entry = rb_entry(parent, struct hot_range_item, rb_node); + + if (start < entry->start) + p = &(*p)->rb_left; + else if (start >= hot_rb_range_end(entry)) + p = &(*p)->rb_right; + else + return parent; + } + + entry = rb_entry(node, struct hot_range_item, rb_node); + entry->in_tree = 1; + rb_link_node(node, parent, p); + rb_insert_color(node, root); + + return NULL; +} + +/* + * Add a hot_inode_item to a hot_inode_tree. If the tree already contains + * an item with the index given, return -EEXIST + */ +static int hot_rb_add_hot_inode_item(struct hot_inode_tree *tree, + struct hot_inode_item *he) +{ + int ret = 0; + struct rb_node *rb; + + rb = hot_rb_insert_hot_inode_item( + &tree->map, he->i_ino, &he->rb_node); + if (rb) { + ret = -EEXIST; + goto out; + } + + kref_get(&he->refs); + +out: + return ret; +} + +/* + * Add a hot_range_item to a hot_range_tree. If the tree already contains + * an item with the index given, return -EEXIST + * + * Also optionally aggresively merge ranges (currently disabled) + */ +static int hot_rb_add_hot_range_item(struct hot_range_tree *tree, + struct hot_range_item *hr) +{ + int ret = 0; + struct rb_node *rb; + + rb = hot_rb_insert_hot_range_item( + &tree->map, hr->start, &hr->rb_node); + if (rb) { + ret = -EEXIST; + goto out; + } + + kref_get(&hr->refs); + +out: + return ret; +} + +/* + * Lookup a hot_inode_item in the hot_inode_tree with the given index + * (inode_num) + */ +struct hot_inode_item +*hot_rb_lookup_hot_inode_item(struct hot_inode_tree *tree, + unsigned long inode_num) +{ + struct rb_node **p = &(tree->map.rb_node); + struct rb_node *parent = N