Re: [RFC] pdirops: vfs patch

2005-02-23 Thread Alex Tomas
> Jan Blunck (JB) writes:

 JB> Nope, d_alloc() is setting d_flags to DCACHE_UNHASHED. Therefore it is not 
found
 JB> by __d_lookup() until it is rehashed which is implicit done by ->lookup().

that means we can have two processes allocated dentry for
same name. they'll call ->lookup() each against own dentry,
fill them and hash. so, we'll get two equal dentries. 
is that OK?

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-22 Thread Jan Blunck
Quoting Alex Tomas <[EMAIL PROTECTED]>:

> > Jan Blunck (JB) writes:
>
>  JB> i_sem does NOT protect the dcache. Also not in real_lookup(). The lock
> must be
>  JB> acquired for ->lookup() and because we might sleep on i_sem, we have to
> get it
>  JB> early and check for repopulation of the dcache.
>
> dentry is part of dcache, right? i_sem protects dentry from being
> returned with incomplete data inside, right?
>

Nope, d_alloc() is setting d_flags to DCACHE_UNHASHED. Therefore it is not found
by __d_lookup() until it is rehashed which is implicit done by ->lookup().

Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-22 Thread Alex Tomas
> Jan Blunck (JB) writes:

 JB> i_sem does NOT protect the dcache. Also not in real_lookup(). The lock 
must be
 JB> acquired for ->lookup() and because we might sleep on i_sem, we have to 
get it
 JB> early and check for repopulation of the dcache.

dentry is part of dcache, right? i_sem protects dentry from being
returned with incomplete data inside, right?


thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-22 Thread Jan Blunck
Quoting Alex Tomas <[EMAIL PROTECTED]>:

> > Jan Blunck (JB) writes:
>
>  >> 1) i_sem protects dcache too
>
>  JB> Where? i_sem is the per-inode lock, and shouldn't be used else.
>
> read comments in fs/namei.c:read_lookup()
>

i_sem does NOT protect the dcache. Also not in real_lookup(). The lock must be
acquired for ->lookup() and because we might sleep on i_sem, we have to get it
early and check for repopulation of the dcache.

> we've already done this for ext3. it works.
> it speeds some loads up significantly.
> especially on big directories.
> and you can control this via mount option,
> so almost zero cost for fs that doesn't support pdirops.

Ok.

Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-22 Thread Alex Tomas
> Jan Blunck (JB) writes:

 >> 1) i_sem protects dcache too

 JB> Where? i_sem is the per-inode lock, and shouldn't be used else.

read comments in fs/namei.c:read_lookup()

 >> 2) tmpfs has no "own" data, so we can use it this way (see 2nd patch)
 >> 3) I have pdirops patch for ext3, but it needs some cleaning ...

 JB> I think you didn't get my point.

 JB> 1) Your approach is duplicating the locking effort for regular filesystem
 JB> (like ext2):
 JB> a) locking with s_pdirops_sems
 JB> b) locking the low-level filesystem data
 JB> It's cool that it speeds up tmpfs, but I don't think that this legatimate 
the
 JB> doubled locking for every other filesystem.
 JB> I'm not sure that it also increases performance for regular filesystems, 
if you
 JB> do the locking right.

we've already done this for ext3. it works.
it speeds some loads up significantly.
especially on big directories.
and you can control this via mount option,
so almost zero cost for fs that doesn't support pdirops.

 JB> 2) In my opinion, a superblock-wide semaphore array which allows 1024
 JB> different (different names and different operations) accesses to ONE single
 JB> inode (which is the data it should protect) is not a good idea.

yes, it has some weakness, i'm reworking vfs patch to avoid inter-inode
collisions.

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-22 Thread Jan Blunck
Quoting Alex Tomas <[EMAIL PROTECTED]>:

>
> 1) i_sem protects dcache too

Where? i_sem is the per-inode lock, and shouldn't be used else.

> 2) tmpfs has no "own" data, so we can use it this way (see 2nd patch)
> 3) I have pdirops patch for ext3, but it needs some cleaning ...

I think you didn't get my point.

1) Your approach is duplicating the locking effort for regular filesystem
(like ext2):
a) locking with s_pdirops_sems
b) locking the low-level filesystem data
It's cool that it speeds up tmpfs, but I don't think that this legatimate the
doubled locking for every other filesystem.
I'm not sure that it also increases performance for regular filesystems, if you
do the locking right.

2) In my opinion, a superblock-wide semaphore array which allows 1024
different (different names and different operations) accesses to ONE single
inode (which is the data it should protect) is not a good idea.

Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-20 Thread Alex Tomas
> Jan Blunck (JB) writes:

 JB> With luck you have s_pdirops_size (or 1024) different renames altering
 JB> concurrently one directory inode. Therefore you need a lock protecting
 JB> your filesystem data. This is basically the job done by i_sem. So in
 JB> my opinion you only move "The Problem" from the VFS to the lowlevel
 JB> filesystems. But then there is no need for i_sem or your
 JB> s_pdirops_sems anymore.

1) i_sem protects dcache too
2) tmpfs has no "own" data, so we can use it this way (see 2nd patch)
3) I have pdirops patch for ext3, but it needs some cleaning ...

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-20 Thread Jan Blunck
Alex Tomas wrote:
+static inline struct semaphore * lock_sem(struct inode *dir, struct qstr *name)
+{
+   if (IS_PDIROPS(dir)) {
+   struct super_block *sb;
+   /* name->hash expected to be already calculated */
+   sb = dir->i_sb;
+   BUG_ON(sb->s_pdirops_sems == NULL);
+   return sb->s_pdirops_sems + name->hash % sb->s_pdirops_size;
+   }
+   return &dir->i_sem;
+}
+
+static inline void lock_dir(struct inode *dir, struct qstr *name)
+{
+   down(lock_sem(dir, name));
+}
+

@@ -1182,12 +1204,26 @@
 /*
  * p1 and p2 should be directories on the same fs.
  */
-struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
+struct dentry *lock_rename(struct dentry *p1, struct qstr *n1,
+struct dentry *p2, struct qstr *n2)
 {
 	struct dentry *p;
 
 	if (p1 == p2) {
-		down(&p1->d_inode->i_sem);
+		if (IS_PDIROPS(p1->d_inode)) {
+			unsigned int h1, h2;
+			h1 = n1->hash % p1->d_inode->i_sb->s_pdirops_size;
+			h2 = n2->hash % p2->d_inode->i_sb->s_pdirops_size;
+			if (h1 < h2) {
+lock_dir(p1->d_inode, n1);
+lock_dir(p2->d_inode, n2);
+			} else if (h1 > h2) {
+lock_dir(p2->d_inode, n2);
+lock_dir(p1->d_inode, n1);
+			} else
+lock_dir(p1->d_inode, n1);
+		} else
+			down(&p1->d_inode->i_sem);
 		return NULL;
 	}
 
@@ -1195,31 +1231,35 @@
 
 	for (p = p1; p->d_parent != p; p = p->d_parent) {
 		if (p->d_parent == p2) {
-			down(&p2->d_inode->i_sem);
-			down(&p1->d_inode->i_sem);
+			lock_dir(p2->d_inode, n2);
+			lock_dir(p1->d_inode, n1);
 			return p;
 		}
 	}
 
 	for (p = p2; p->d_parent != p; p = p->d_parent) {
 		if (p->d_parent == p1) {
-			down(&p1->d_inode->i_sem);
-			down(&p2->d_inode->i_sem);
+			lock_dir(p1->d_inode, n1);
+			lock_dir(p2->d_inode, n2);
 			return p;
 		}
 	}
 
-	down(&p1->d_inode->i_sem);
-	down(&p2->d_inode->i_sem);
+	lock_dir(p1->d_inode, n1);
+	lock_dir(p2->d_inode, n2);
 	return NULL;
 }
With luck you have s_pdirops_size (or 1024) different renames altering 
concurrently one directory inode. Therefore you need a lock protecting 
your filesystem data. This is basically the job done by i_sem. So in my 
opinion you only move "The Problem" from the VFS to the lowlevel 
filesystems. But then there is no need for i_sem or your s_pdirops_sems 
anymore.

Regards,
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] pdirops: vfs patch

2005-02-19 Thread Alex Tomas


 fs/inode.c |1 
 fs/namei.c |   66 ++---
 include/linux/fs.h |   11 
 3 files changed, 54 insertions(+), 24 deletions(-)

Index: linux-2.6.10/fs/namei.c
===
--- linux-2.6.10.orig/fs/namei.c2005-01-28 19:32:13.0 +0300
+++ linux-2.6.10/fs/namei.c 2005-02-19 20:40:05.763437304 +0300
@@ -104,6 +104,28 @@
  * any extra contention...
  */
 
+static inline struct semaphore * lock_sem(struct inode *dir, struct qstr *name)
+{
+   if (IS_PDIROPS(dir)) {
+   struct super_block *sb;
+   /* name->hash expected to be already calculated */
+   sb = dir->i_sb;
+   BUG_ON(sb->s_pdirops_sems == NULL);
+   return sb->s_pdirops_sems + name->hash % sb->s_pdirops_size;
+   }
+   return &dir->i_sem;
+}
+
+static inline void lock_dir(struct inode *dir, struct qstr *name)
+{
+   down(lock_sem(dir, name));
+}
+
+static inline void unlock_dir(struct inode *dir, struct qstr *name)
+{
+   up(lock_sem(dir, name));
+}
+
 /* In order to reduce some races, while at the same time doing additional
  * checking and hopefully speeding things up, we copy filenames to the
  * kernel data space before using them..
@@ -380,7 +402,7 @@
struct dentry * result;
struct inode *dir = parent->d_inode;
 
-   down(&dir->i_sem);
+   lock_dir(dir, name);
/*
 * First re-do the cached lookup just in case it was created
 * while we waited for the directory semaphore..
@@ -406,7 +428,7 @@
else
result = dentry;
}
-   up(&dir->i_sem);
+   unlock_dir(dir, name);
return result;
}
 
@@ -414,7 +436,7 @@
 * Uhhuh! Nasty case: the cache was re-populated while
 * we waited on the semaphore. Need to revalidate.
 */
-   up(&dir->i_sem);
+   unlock_dir(dir, name);
if (result->d_op && result->d_op->d_revalidate) {
if (!result->d_op->d_revalidate(result, nd) && 
!d_invalidate(result)) {
dput(result);
@@ -1182,12 +1204,26 @@
 /*
  * p1 and p2 should be directories on the same fs.
  */
-struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
+struct dentry *lock_rename(struct dentry *p1, struct qstr *n1,
+   struct dentry *p2, struct qstr *n2)
 {
struct dentry *p;
 
if (p1 == p2) {
-   down(&p1->d_inode->i_sem);
+   if (IS_PDIROPS(p1->d_inode)) {
+   unsigned int h1, h2;
+   h1 = n1->hash % p1->d_inode->i_sb->s_pdirops_size;
+   h2 = n2->hash % p2->d_inode->i_sb->s_pdirops_size;
+   if (h1 < h2) {
+   lock_dir(p1->d_inode, n1);
+   lock_dir(p2->d_inode, n2);
+   } else if (h1 > h2) {
+   lock_dir(p2->d_inode, n2);
+   lock_dir(p1->d_inode, n1);
+   } else
+   lock_dir(p1->d_inode, n1);
+   } else
+   down(&p1->d_inode->i_sem);
return NULL;
}
 
@@ -1195,31 +1231,35 @@
 
for (p = p1; p->d_parent != p; p = p->d_parent) {
if (p->d_parent == p2) {
-   down(&p2->d_inode->i_sem);
-   down(&p1->d_inode->i_sem);
+   lock_dir(p2->d_inode, n2);
+   lock_dir(p1->d_inode, n1);
return p;
}
}
 
for (p = p2; p->d_parent != p; p = p->d_parent) {
if (p->d_parent == p1) {
-   down(&p1->d_inode->i_sem);
-   down(&p2->d_inode->i_sem);
+   lock_dir(p1->d_inode, n1);
+   lock_dir(p2->d_inode, n2);
return p;
}
}
 
-   down(&p1->d_inode->i_sem);
-   down(&p2->d_inode->i_sem);
+   lock_dir(p1->d_inode, n1);
+   lock_dir(p2->d_inode, n2);
return NULL;
 }
 
-void unlock_rename(struct dentry *p1, struct dentry *p2)
+void unlock_rename(struct dentry *p1, struct qstr *n1,
+   struct dentry *p2, struct qstr *n2)
 {
-   up(&p1->d_inode->i_sem);
+   unlock_dir(p1->d_inode, n1);
if (p1 != p2) {
-   up(&p2->d_inode->i_sem);
+   unlock_dir(p2->d_inode, n2);
up(&p1->d_inode->i_sb->s_vfs_rename_sem);
+   } else if (IS_PDIROPS(p1->d_inode) && 
+   n1->hash != n2->hash) {
+   unlock_dir(p2->d_inode, n2);
}
 }
 
@@ -1386,13 +1426,13 @@
 
dir = nd->dentry;
nd->flags &= ~LOOKUP_PARENT;
-   down(