Re: [PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry

2011-05-26 Thread Andreas Dilger
On May 26, 2011, at 14:03, Andi Kleen wrote:
> On Thu, May 26, 2011 at 03:02:42PM -0400, Josef Bacik wrote:
>> +/*
>> + * If this dentry needs lookup, don't set the referenced flag so that it
>> + * is more likely to be cleaned up by the dcache shrinker in case of
>> + * memory pressure.
>> + */
>> +if (!d_need_lookup(dentry))
>> +dentry->d_flags |= DCACHE_REFERENCED;
> 
> No it doesn't at all. The allocation will just push everything else
> out.
> 
> Really you cannot view this by only looking at the dcache.
> You have to look at the complete VM behaviour. All the caches
> and the other memory interact.

Even without this patch, if you are doing "find /" there will be considerable 
memory pressure from all of the pages that are accessed by readdir(), and to a 
lesser extent the directory dentry/inode entries.  I'm not sure whether the 
issue you raise is going to be significant in the end or not.


Taking this development a bit further, I've long thought it would be quite 
interesting if these dcache entries could be linked into a readdir-ordered 
linked list, and then discard the actual directory pages from cache entirely.  
This would potentially allow later readdir operations to work just by walking 
the readdir-ordered linked list and not touching the page cache at all.

Since normal operations like "ls -l" currently instantiate both a pagecache for 
readdir, and a dentry for each dirent, it could potentially even reduce memory 
if there was no a need to keep the directory pages in cache anymore.

>>> d_alloc uses a normal GFP_KERNEL, which is quite in appropiate for this.
>>> 
>>> It should at least reclaim and probably more, but even then it's
>>> risky.
>>> 
>> 
>> Ah yeah I guess I should have probably used GFP_KERNEL.  Sorry about that,
> 
> GFP_KERNEL is already used, but it's wrong. I'm not sure any
> of the existing GFP_* flags will give you the semantics you
> need in fact. The new flag Minchan added for readahead may come
> near, but even that is probably not enough.
> 
> -Andi
> 
> -- 
> a...@linux.intel.com -- Speaking for myself only.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry

2011-05-26 Thread Josef Bacik
On 05/26/2011 04:03 PM, Andi Kleen wrote:
> On Thu, May 26, 2011 at 03:02:42PM -0400, Josef Bacik wrote:
>> +/*
>> + * If this dentry needs lookup, don't set the referenced flag so that it
>> + * is more likely to be cleaned up by the dcache shrinker in case of
>> + * memory pressure.
>> + */
>> +if (!d_need_lookup(dentry))
>> +dentry->d_flags |= DCACHE_REFERENCED;
> 
> No it doesn't at all. The allocation will just push everything else
> out.
> 
> Really you cannot view this by only looking at the dcache.
> You have to look at the complete VM behaviour. All the caches
> and the other memory interact.

Agreed, but this makes it monumentally easier to push these entries out
of the cache, which is the best I can do with what I have for now.

>>
>>
>>> d_alloc uses a normal GFP_KERNEL, which is quite in appropiate for this.
>>>
>>> It should at least reclaim and probably more, but even then it's
>>> risky.
>>>
>>
>> Ah yeah I guess I should have probably used GFP_KERNEL.  Sorry about that,
> 
> GFP_KERNEL is already used, but it's wrong. I'm not sure any
> of the existing GFP_* flags will give you the semantics you
> need in fact. The new flag Minchan added for readahead may come
> near, but even that is probably not enough.

Yeah if there was a GFP_DONTTRYTOHARD I would use that but there isn't.
 Maybe I'll code that up.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry

2011-05-26 Thread Andi Kleen
On Thu, May 26, 2011 at 03:02:42PM -0400, Josef Bacik wrote:
> + /*
> +  * If this dentry needs lookup, don't set the referenced flag so that it
> +  * is more likely to be cleaned up by the dcache shrinker in case of
> +  * memory pressure.
> +  */
> + if (!d_need_lookup(dentry))
> + dentry->d_flags |= DCACHE_REFERENCED;

No it doesn't at all. The allocation will just push everything else
out.

Really you cannot view this by only looking at the dcache.
You have to look at the complete VM behaviour. All the caches
and the other memory interact.
> 
> 
> > d_alloc uses a normal GFP_KERNEL, which is quite in appropiate for this.
> > 
> > It should at least reclaim and probably more, but even then it's
> > risky.
> > 
> 
> Ah yeah I guess I should have probably used GFP_KERNEL.  Sorry about that,

GFP_KERNEL is already used, but it's wrong. I'm not sure any
of the existing GFP_* flags will give you the semantics you
need in fact. The new flag Minchan added for readahead may come
near, but even that is probably not enough.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry

2011-05-26 Thread Josef Bacik
On 05/26/2011 02:50 PM, Andi Kleen wrote:
> Josef Bacik  writes:
>> +
>> +newkey = kzalloc(sizeof(struct btrfs_key),
>> + GFP_NOFS);
>> +if (!newkey)
>> +goto no_dentry;
>> +tmp = d_alloc(filp->f_dentry, &q);
> 
> This doesn't seem to address the "find / fills all memory with dentries"
> concerns brought up earlier at all.
> 

Nope, this part does in patch 1/1

+   /*
+* If this dentry needs lookup, don't set the referenced flag so that it
+* is more likely to be cleaned up by the dcache shrinker in case of
+* memory pressure.
+*/
+   if (!d_need_lookup(dentry))
+   dentry->d_flags |= DCACHE_REFERENCED;


> d_alloc uses a normal GFP_KERNEL, which is quite in appropiate for this.
> 
> It should at least reclaim and probably more, but even then it's
> risky.
> 

Ah yeah I guess I should have probably used GFP_KERNEL.  Sorry about that,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry

2011-05-26 Thread Andi Kleen
Josef Bacik  writes:
> +
> + newkey = kzalloc(sizeof(struct btrfs_key),
> +  GFP_NOFS);
> + if (!newkey)
> + goto no_dentry;
> + tmp = d_alloc(filp->f_dentry, &q);

This doesn't seem to address the "find / fills all memory with dentries"
concerns brought up earlier at all.

d_alloc uses a normal GFP_KERNEL, which is quite in appropiate for this.

It should at least reclaim and probably more, but even then it's
risky.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry

2011-05-26 Thread Josef Bacik
In btrfs we have 2 indexes for inodes.  One is for readdir, it's in this nice
sequential order and works out brilliantly for readdir.  However if you use ls,
it usually stat's each file it gets from readdir.  This is where the second
index comes in, which is based on a hash of the name of the file.  So then the
lookup has to lookup this index, and then lookup the inode.  The index lookup is
going to be in random order (since its based on the name hash), which gives us
less than stellar performance.  Since we know the inode location from the
readdir index, I create a dummy dentry and copy the location key into
dentry->d_fsdata.  Then on lookup if we have d_fsdata we use that location to
lookup the inode, avoiding looking up the other directory index.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/inode.c |   47 +--
 1 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6228a30..2a1a028 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4120,12 +4120,19 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
struct dentry *dentry)
struct btrfs_root *sub_root = root;
struct btrfs_key location;
int index;
-   int ret;
+   int ret = 0;
 
if (dentry->d_name.len > BTRFS_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
 
-   ret = btrfs_inode_by_name(dir, dentry, &location);
+   if (unlikely(d_need_lookup(dentry))) {
+   memcpy(&location, dentry->d_fsdata, sizeof(struct btrfs_key));
+   kfree(dentry->d_fsdata);
+   dentry->d_fsdata = NULL;
+   d_clear_need_lookup(dentry);
+   } else {
+   ret = btrfs_inode_by_name(dir, dentry, &location);
+   }
 
if (ret < 0)
return ERR_PTR(ret);
@@ -4180,6 +4187,12 @@ static int btrfs_dentry_delete(const struct dentry 
*dentry)
return 0;
 }
 
+static void btrfs_dentry_release(struct dentry *dentry)
+{
+   if (dentry->d_fsdata)
+   kfree(dentry->d_fsdata);
+}
+
 static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
   struct nameidata *nd)
 {
@@ -4206,6 +4219,7 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_path *path;
+   struct qstr q;
int ret;
struct extent_buffer *leaf;
int slot;
@@ -4284,6 +4298,7 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
 
while (di_cur < di_total) {
struct btrfs_key location;
+   struct dentry *tmp;
 
if (verify_dir_item(root, leaf, di))
break;
@@ -4304,6 +4319,33 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
d_type = btrfs_filetype_table[btrfs_dir_type(leaf, di)];
btrfs_dir_item_key_to_cpu(leaf, di, &location);
 
+   q.name = name_ptr;
+   q.len = name_len;
+   q.hash = full_name_hash(q.name, q.len);
+   tmp = d_lookup(filp->f_dentry, &q);
+   if (!tmp) {
+   struct btrfs_key *newkey;
+
+   newkey = kzalloc(sizeof(struct btrfs_key),
+GFP_NOFS);
+   if (!newkey)
+   goto no_dentry;
+   tmp = d_alloc(filp->f_dentry, &q);
+   if (!tmp) {
+   kfree(newkey);
+   dput(tmp);
+   goto no_dentry;
+   }
+   memcpy(newkey, &location,
+  sizeof(struct btrfs_key));
+   tmp->d_fsdata = newkey;
+   tmp->d_flags |= DCACHE_NEED_LOOKUP;
+   d_rehash(tmp);
+   dput(tmp);
+   } else {
+   dput(tmp);
+   }
+no_dentry:
/* is this a reference to our own snapshot? If so
 * skip it
 */
@@ -7566,4 +7608,5 @@ static const struct inode_operations 
btrfs_symlink_inode_operations = {
 
 const struct dentry_operations btrfs_dentry_operations = {
.d_delete   = btrfs_dentry_delete,
+   .d_release  = btrfs_dentry_release,
 };
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry V2

2011-05-20 Thread Josef Bacik
In btrfs we have 2 indexes for inodes.  One is for readdir, it's in this nice
sequential order and works out brilliantly for readdir.  However if you use ls,
it usually stat's each file it gets from readdir.  This is where the second
index comes in, which is based on a hash of the name of the file.  So then the
lookup has to lookup this index, and then lookup the inode.  The index lookup is
going to be in random order (since its based on the name hash), which gives us
less than stellar performance.  Since we know the inode location from the
readdir index, I create a dummy dentry and copy the location key into
dentry->d_fsdata.  Then on lookup if we have d_fsdata we use that location to
lookup the inode, avoiding looking up the other directory index.  Thanks,

Signed-off-by: Josef Bacik 
---
V1->V2: use the d_clear_need_lookup() helper instead of d_drop.

 fs/btrfs/inode.c |   47 +--
 1 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6228a30..2a1a028 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4120,12 +4120,19 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
struct dentry *dentry)
struct btrfs_root *sub_root = root;
struct btrfs_key location;
int index;
-   int ret;
+   int ret = 0;
 
if (dentry->d_name.len > BTRFS_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
 
-   ret = btrfs_inode_by_name(dir, dentry, &location);
+   if (unlikely(d_need_lookup(dentry))) {
+   memcpy(&location, dentry->d_fsdata, sizeof(struct btrfs_key));
+   kfree(dentry->d_fsdata);
+   dentry->d_fsdata = NULL;
+   d_clear_need_lookup(dentry);
+   } else {
+   ret = btrfs_inode_by_name(dir, dentry, &location);
+   }
 
if (ret < 0)
return ERR_PTR(ret);
@@ -4180,6 +4187,12 @@ static int btrfs_dentry_delete(const struct dentry 
*dentry)
return 0;
 }
 
+static void btrfs_dentry_release(struct dentry *dentry)
+{
+   if (dentry->d_fsdata)
+   kfree(dentry->d_fsdata);
+}
+
 static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
   struct nameidata *nd)
 {
@@ -4206,6 +4219,7 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_path *path;
+   struct qstr q;
int ret;
struct extent_buffer *leaf;
int slot;
@@ -4284,6 +4298,7 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
 
while (di_cur < di_total) {
struct btrfs_key location;
+   struct dentry *tmp;
 
if (verify_dir_item(root, leaf, di))
break;
@@ -4304,6 +4319,33 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
d_type = btrfs_filetype_table[btrfs_dir_type(leaf, di)];
btrfs_dir_item_key_to_cpu(leaf, di, &location);
 
+   q.name = name_ptr;
+   q.len = name_len;
+   q.hash = full_name_hash(q.name, q.len);
+   tmp = d_lookup(filp->f_dentry, &q);
+   if (!tmp) {
+   struct btrfs_key *newkey;
+
+   newkey = kzalloc(sizeof(struct btrfs_key),
+GFP_NOFS);
+   if (!newkey)
+   goto no_dentry;
+   tmp = d_alloc(filp->f_dentry, &q);
+   if (!tmp) {
+   kfree(newkey);
+   dput(tmp);
+   goto no_dentry;
+   }
+   memcpy(newkey, &location,
+  sizeof(struct btrfs_key));
+   tmp->d_fsdata = newkey;
+   tmp->d_flags |= DCACHE_NEED_LOOKUP;
+   d_rehash(tmp);
+   dput(tmp);
+   } else {
+   dput(tmp);
+   }
+no_dentry:
/* is this a reference to our own snapshot? If so
 * skip it
 */
@@ -7566,4 +7608,5 @@ static const struct inode_operations 
btrfs_symlink_inode_operations = {
 
 const struct dentry_operations btrfs_dentry_operations = {
.d_delete   = btrfs_dentry_delete,
+   .d_release  = btrfs_dentry_release,
 };
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo

[PATCH 2/2] Btrfs: load the key from the dir item in readdir into a fake dentry

2011-05-19 Thread Josef Bacik
In btrfs we have 2 indexes for inodes.  One is for readdir, it's in this nice
sequential order and works out brilliantly for readdir.  However if you use ls,
it usually stat's each file it gets from readdir.  This is where the second
index comes in, which is based on a hash of the name of the file.  So then the
lookup has to lookup this index, and then lookup the inode.  The index lookup is
going to be in random order (since its based on the name hash), which gives us
less than stellar performance.  Since we know the inode location from the
readdir index, I create a dummy dentry and copy the location key into
dentry->d_fsdata.  Then on lookup if we have d_fsdata we use that location to
lookup the inode, avoiding looking up the other directory index.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/inode.c |   51 +--
 1 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6228a30..3cd246c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4120,12 +4120,23 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
struct dentry *dentry)
struct btrfs_root *sub_root = root;
struct btrfs_key location;
int index;
-   int ret;
+   int ret = 0;
 
if (dentry->d_name.len > BTRFS_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
 
-   ret = btrfs_inode_by_name(dir, dentry, &location);
+   if (dentry->d_fsdata) {
+   memcpy(&location, dentry->d_fsdata, sizeof(struct btrfs_key));
+   kfree(dentry->d_fsdata);
+   dentry->d_fsdata = NULL;
+   /*
+* We need to unhash this dentry so we can rehash it when we
+* find the inode.
+*/
+   d_drop(dentry);
+   } else {
+   ret = btrfs_inode_by_name(dir, dentry, &location);
+   }
 
if (ret < 0)
return ERR_PTR(ret);
@@ -4180,6 +4191,12 @@ static int btrfs_dentry_delete(const struct dentry 
*dentry)
return 0;
 }
 
+static void btrfs_dentry_release(struct dentry *dentry)
+{
+   if (dentry->d_fsdata)
+   kfree(dentry->d_fsdata);
+}
+
 static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
   struct nameidata *nd)
 {
@@ -4206,6 +4223,7 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_path *path;
+   struct qstr q;
int ret;
struct extent_buffer *leaf;
int slot;
@@ -4284,6 +4302,7 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
 
while (di_cur < di_total) {
struct btrfs_key location;
+   struct dentry *tmp;
 
if (verify_dir_item(root, leaf, di))
break;
@@ -4304,6 +4323,33 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
d_type = btrfs_filetype_table[btrfs_dir_type(leaf, di)];
btrfs_dir_item_key_to_cpu(leaf, di, &location);
 
+   q.name = name_ptr;
+   q.len = name_len;
+   q.hash = full_name_hash(q.name, q.len);
+   tmp = d_lookup(filp->f_dentry, &q);
+   if (!tmp) {
+   struct btrfs_key *newkey;
+
+   newkey = kzalloc(sizeof(struct btrfs_key),
+GFP_NOFS);
+   if (!newkey)
+   goto no_dentry;
+   tmp = d_alloc(filp->f_dentry, &q);
+   if (!tmp) {
+   kfree(newkey);
+   dput(tmp);
+   goto no_dentry;
+   }
+   memcpy(newkey, &location,
+  sizeof(struct btrfs_key));
+   tmp->d_fsdata = newkey;
+   tmp->d_flags |= DCACHE_NEED_LOOKUP;
+   d_rehash(tmp);
+   dput(tmp);
+   } else {
+   dput(tmp);
+   }
+no_dentry:
/* is this a reference to our own snapshot? If so
 * skip it
 */
@@ -7566,4 +7612,5 @@ static const struct inode_operations 
btrfs_symlink_inode_operations = {
 
 const struct dentry_operations btrfs_dentry_operations = {
.d_delete   = btrfs_dentry_delete,
+   .d_release  = btrfs_dentry_release,
 };
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the