from:"Nate Diller"

Re: [PATCH] zero_user_page uses in fs/buffer.c and fs/libfs.c

2007-05-01 Thread Nate Diller

On 5/1/07, Nate Diller <[EMAIL PROTECTED]> wrote:

On 5/1/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Tue, 1 May 2007, Andrew Morton wrote:
>
> > As Satyam said, this will sometimes cause us to map and unmap the page
> > twice, and to run flush_dcache_page() twice.  In not-terribly-uncommon
> > circumstances in very frequently called functions.
> >
> > Doesn't seem worth it to me.
>
> Ok but we have that code three times. Should I add a variant of
> zero_user_page that zeroes everything but the section specified?
>
> zero_user_page_allbut() ?

the function already exists, it's called simple_prepare_write(), and i
thought there were patches to convert those callsites in -mm ...
although it looks like i let a review comment on that slide by.  I
need to redo a bunch of patches for re-submission anyway, so i guess
i'll deal with that tomorrow.

NATE

well, leave it to me to reply too quickly, sorry.  i think we should
leave simple_prepare_write() the way it is, since it's a library
function itself.  the other two callsites in your patch are buffers,
which may themselves be smaller than a page so you would need a
special function for just those two uses, there's no other way to
avoid making two calls to flush_dcache_page().  if it's tremendously
important to you to eliminate open coding of these, maybe make a
'static int buffer_prepare_write()' or some such in fs/buffer.c

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] zero_user_page uses in fs/buffer.c and fs/libfs.c

2007-05-01 Thread Nate Diller

On 5/1/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:

On Tue, 1 May 2007, Andrew Morton wrote:

> As Satyam said, this will sometimes cause us to map and unmap the page
> twice, and to run flush_dcache_page() twice.  In not-terribly-uncommon
> circumstances in very frequently called functions.
>
> Doesn't seem worth it to me.

Ok but we have that code three times. Should I add a variant of
zero_user_page that zeroes everything but the section specified?

zero_user_page_allbut() ?

the function already exists, it's called simple_prepare_write(), and i
thought there were patches to convert those callsites in -mm ...
although it looks like i let a review comment on that slide by.  I
need to redo a bunch of patches for re-submission anyway, so i guess
i'll deal with that tomorrow.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] zero_user_page uses in fs/buffer.c and fs/libfs.c

2007-05-01 Thread Nate Diller


On 5/1/07, Christoph Lameter [EMAIL PROTECTED] wrote:

On Tue, 1 May 2007, Andrew Morton wrote:

 As Satyam said, this will sometimes cause us to map and unmap the page
 twice, and to run flush_dcache_page() twice.  In not-terribly-uncommon
 circumstances in very frequently called functions.

 Doesn't seem worth it to me.

Ok but we have that code three times. Should I add a variant of
zero_user_page that zeroes everything but the section specified?

zero_user_page_allbut() ?


the function already exists, it's called simple_prepare_write(), and i
thought there were patches to convert those callsites in -mm ...
although it looks like i let a review comment on that slide by.  I
need to redo a bunch of patches for re-submission anyway, so i guess
i'll deal with that tomorrow.

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] zero_user_page uses in fs/buffer.c and fs/libfs.c

2007-05-01 Thread Nate Diller


On 5/1/07, Nate Diller [EMAIL PROTECTED] wrote:

On 5/1/07, Christoph Lameter [EMAIL PROTECTED] wrote:
 On Tue, 1 May 2007, Andrew Morton wrote:

  As Satyam said, this will sometimes cause us to map and unmap the page
  twice, and to run flush_dcache_page() twice.  In not-terribly-uncommon
  circumstances in very frequently called functions.
 
  Doesn't seem worth it to me.

 Ok but we have that code three times. Should I add a variant of
 zero_user_page that zeroes everything but the section specified?

 zero_user_page_allbut() ?

the function already exists, it's called simple_prepare_write(), and i
thought there were patches to convert those callsites in -mm ...
although it looks like i let a review comment on that slide by.  I
need to redo a bunch of patches for re-submission anyway, so i guess
i'll deal with that tomorrow.

NATE



well, leave it to me to reply too quickly, sorry.  i think we should
leave simple_prepare_write() the way it is, since it's a library
function itself.  the other two callsites in your patch are buffers,
which may themselves be smaller than a page so you would need a
special function for just those two uses, there's no other way to
avoid making two calls to flush_dcache_page().  if it's tremendously
important to you to eliminate open coding of these, maybe make a
'static int buffer_prepare_write()' or some such in fs/buffer.c

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page

2007-04-12 Thread Nate Diller

On 4/12/07, David Howells <[EMAIL PROTECTED]> wrote:

Nate Diller <[EMAIL PROTECTED]> wrote:

> Hmmm you're right.  Is your security work going into the next -mm?

I don't know.  Andrew hasn't said anything.  Andrew?  Are you waiting for it
to go through DaveM's networking tree?

> If so, I'll just re-base this cleanup patch on that ... at the very least I
> want to get rid of afs_dir_put_page().

That's reasonable.

> Also, did you consider passing the key pointer directly and modifying the
> readpage actor to simply cast the pointer back, like
> read_mapping_page(mapping, page, (struct file *)key)?  It seems like a waste
> to allocate a whole file struct on the stack just for the ->private field.

There's one small problem with that...  And that's filemap_nopage() (it passes
vma->vm_file to readpage() unconditionally).  Unless, of course, your patches
fix that too...

But you can't mmap() a directory anyway so ... oh.  Interesting.
afs_file_readpage() does directories too.  The only thing I can think
of then is

struct address_space_operations afs_file_aops = {
   .readpage   = afs_file_readpage,
}
struct address_space_operations afs_dir_aops = {
   .readpage   = afs_key_readpage,
}

int afs_file_readpage(file, page){
   return afs_key_readpage(file->private, page)
}

but that's a lot of code to avoid a single stack allocation.  The
whole fake file pointer thing still strikes me as a little ugly, and
you're definitely not the first one who needed this sort of hackery.
ugh

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/17] cramfs: use read_mapping_page

2007-04-12 Thread Nate Diller

On 4/12/07, Roman Zippel <[EMAIL PROTECTED]> wrote:

Hi,

On Thu, 12 Apr 2007, Christoph Hellwig wrote:

> On Wed, Apr 11, 2007 at 07:49:38PM -0700, Nate Diller wrote:
> > read_mapping_page_async() is going away, so convert its only user to
> > read_mapping_page().  This change has not been benchmarked, however, in
> > order to get real parallelism this wants something completely different,
> > like __do_page_cache_readahead(), which is not currently exported.
>
> Why is read_mapping_page_async going away?  This probably needs a lot more
> testing, and I'd be much happier if you split it out of the series and
> sent it separately at the end.

That function wasn't fully async anyway, as it would often sleep in
lock_page(). AFAICT only in the special case of a partial written page
would this function return a not yet uptodate page.

yes, exactly, the structure of read_cache_page() and friends is
totally not appropriate for doing async I/O to more than one page at a
time, and the whole point of the special treatment in cramfs was to
read 4 pages at once rather than synchronously reading each of the 4
seperately.  read_cache_page_async() is totally wrong for that use,
its purpose would be to get a reference to a single page that is
likely to be in cache already without having to take the page_lock.
Turns out nobody needs to do that, so there's no point in keeping it
around.

If the performance gain of reading all 4 pages at once would be worth
the effort, this code should be using __do_page_cache_readahead().
That function allocates all the pages first, then reads them in
asynchronously as a group.  It is currently not exported.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/17] jffs2: convert jffs2_gc_fetch_page to read_cache_page

2007-04-12 Thread Nate Diller


On 4/12/07, Phillip Lougher <[EMAIL PROTECTED]> wrote:

Nate Diller wrote:

> + page = read_cache_page(OFNI_EDONI_2SFFJ(f)->i_mapping,
> + start >> PAGE_CACHE_SHIFT,
> + (void *)jffs2_do_readpage_unlock,
> + OFNI_EDONI_2SFFJ(f));
>
> - if (IS_ERR(pg_ptr)) {
> + if (IS_ERR(page)) {
>   printk(KERN_WARNING "read_cache_page() returned error: %ld\n", 
PTR_ERR(pg_ptr));

should be

printk(KERN_WARNING "read_cache_page() returned error: %ld\n", 
PTR_ERR(page));

> - return PTR_ERR(pg_ptr);
> + return PTR_ERR(page);



wow, you're right.  I was sure I compile-tested this ... oh, "depends
on MTD".  oops.

thanks for reviewing.  does it look OK to you otherwise?

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page

2007-04-12 Thread Nate Diller


On 4/12/07, David Howells <[EMAIL PROTECTED]> wrote:

Nate Diller <[EMAIL PROTECTED]> wrote:

> -static struct page *afs_dir_get_page(struct inode *dir, unsigned long index)

NAK.  This conflicts with my AFS security patches, and eliminates any way of
passing the key through to readpage().


Hmmm you're right.  Is your security work going into the next -mm?  If
so, I'll just re-base this cleanup patch on that ... at the very least
I want to get rid of afs_dir_put_page().  Also, did you consider
passing the key pointer directly and modifying the readpage actor to
simply cast the pointer back, like read_mapping_page(mapping, page,
(struct file *)key)?  It seems like a waste to allocate a whole file
struct on the stack just for the ->private field.

Andrew in the mean time just disregard this patch.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page

2007-04-12 Thread Nate Diller


On 4/12/07, David Howells [EMAIL PROTECTED] wrote:

Nate Diller [EMAIL PROTECTED] wrote:

 -static struct page *afs_dir_get_page(struct inode *dir, unsigned long index)

NAK.  This conflicts with my AFS security patches, and eliminates any way of
passing the key through to readpage().


Hmmm you're right.  Is your security work going into the next -mm?  If
so, I'll just re-base this cleanup patch on that ... at the very least
I want to get rid of afs_dir_put_page().  Also, did you consider
passing the key pointer directly and modifying the readpage actor to
simply cast the pointer back, like read_mapping_page(mapping, page,
(struct file *)key)?  It seems like a waste to allocate a whole file
struct on the stack just for the -private field.

Andrew in the mean time just disregard this patch.

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/17] jffs2: convert jffs2_gc_fetch_page to read_cache_page

2007-04-12 Thread Nate Diller


On 4/12/07, Phillip Lougher [EMAIL PROTECTED] wrote:

Nate Diller wrote:

 + page = read_cache_page(OFNI_EDONI_2SFFJ(f)-i_mapping,
 + start  PAGE_CACHE_SHIFT,
 + (void *)jffs2_do_readpage_unlock,
 + OFNI_EDONI_2SFFJ(f));

 - if (IS_ERR(pg_ptr)) {
 + if (IS_ERR(page)) {
   printk(KERN_WARNING read_cache_page() returned error: %ld\n, 
PTR_ERR(pg_ptr));

should be

printk(KERN_WARNING read_cache_page() returned error: %ld\n, 
PTR_ERR(page));

 - return PTR_ERR(pg_ptr);
 + return PTR_ERR(page);



wow, you're right.  I was sure I compile-tested this ... oh, depends
on MTD.  oops.

thanks for reviewing.  does it look OK to you otherwise?

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/17] cramfs: use read_mapping_page

2007-04-12 Thread Nate Diller


On 4/12/07, Roman Zippel [EMAIL PROTECTED] wrote:

Hi,

On Thu, 12 Apr 2007, Christoph Hellwig wrote:

 On Wed, Apr 11, 2007 at 07:49:38PM -0700, Nate Diller wrote:
  read_mapping_page_async() is going away, so convert its only user to
  read_mapping_page().  This change has not been benchmarked, however, in
  order to get real parallelism this wants something completely different,
  like __do_page_cache_readahead(), which is not currently exported.

 Why is read_mapping_page_async going away?  This probably needs a lot more
 testing, and I'd be much happier if you split it out of the series and
 sent it separately at the end.

That function wasn't fully async anyway, as it would often sleep in
lock_page(). AFAICT only in the special case of a partial written page
would this function return a not yet uptodate page.


yes, exactly, the structure of read_cache_page() and friends is
totally not appropriate for doing async I/O to more than one page at a
time, and the whole point of the special treatment in cramfs was to
read 4 pages at once rather than synchronously reading each of the 4
seperately.  read_cache_page_async() is totally wrong for that use,
its purpose would be to get a reference to a single page that is
likely to be in cache already without having to take the page_lock.
Turns out nobody needs to do that, so there's no point in keeping it
around.

If the performance gain of reading all 4 pages at once would be worth
the effort, this code should be using __do_page_cache_readahead().
That function allocates all the pages first, then reads them in
asynchronously as a group.  It is currently not exported.

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page

2007-04-12 Thread Nate Diller


On 4/12/07, David Howells [EMAIL PROTECTED] wrote:

Nate Diller [EMAIL PROTECTED] wrote:

 Hmmm you're right.  Is your security work going into the next -mm?

I don't know.  Andrew hasn't said anything.  Andrew?  Are you waiting for it
to go through DaveM's networking tree?

 If so, I'll just re-base this cleanup patch on that ... at the very least I
 want to get rid of afs_dir_put_page().

That's reasonable.

 Also, did you consider passing the key pointer directly and modifying the
 readpage actor to simply cast the pointer back, like
 read_mapping_page(mapping, page, (struct file *)key)?  It seems like a waste
 to allocate a whole file struct on the stack just for the -private field.

There's one small problem with that...  And that's filemap_nopage() (it passes
vma-vm_file to readpage() unconditionally).  Unless, of course, your patches
fix that too...


But you can't mmap() a directory anyway so ... oh.  Interesting.
afs_file_readpage() does directories too.  The only thing I can think
of then is

struct address_space_operations afs_file_aops = {
   .readpage   = afs_file_readpage,
}
struct address_space_operations afs_dir_aops = {
   .readpage   = afs_key_readpage,
}

int afs_file_readpage(file, page){
   return afs_key_readpage(file-private, page)
}

but that's a lot of code to avoid a single stack allocation.  The
whole fake file pointer thing still strikes me as a little ugly, and
you're definitely not the first one who needed this sort of hackery.
ugh

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/17] hfs: remove redundant read_mapping_page error check

2007-04-11 Thread Nate Diller

Now that read_mapping_page() does error checking internally, there is no
need to check PageError here.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfs/bnode.c 
linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c
--- linux-2.6.21-rc6-mm1/fs/hfs/bnode.c 2007-04-09 17:20:13.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c2007-04-10 21:28:03.0 
-0700
@@ -282,10 +282,6 @@ static struct hfs_bnode *__hfs_bnode_cre
page = read_mapping_page(mapping, block++, NULL);
if (IS_ERR(page))
goto fail;
-   if (PageError(page)) {
-   page_cache_release(page);
-   goto fail;
-   }
page_cache_release(page);
node->page[i] = page;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/17] jffs2: convert jffs2_gc_fetch_page to read_cache_page

2007-04-11 Thread Nate Diller

Replace jffs2_gc_fetch_page() and jffs2_gc_release_page() using the
read_cache_page() and put_kmapped_page() calls, and update the call site
accordingly.  Explicit calls to kmap()/kunmap() make the code more clear.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/fs.c 
linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c
--- linux-2.6.21-rc5-mm4/fs/jffs2/fs.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c 2007-04-06 01:59:19.0 
-0700
@@ -621,33 +621,6 @@ struct jffs2_inode_info *jffs2_gc_fetch_
return JFFS2_INODE_INFO(inode);
 }
 
-unsigned char *jffs2_gc_fetch_page(struct jffs2_sb_info *c,
-  struct jffs2_inode_info *f,
-  unsigned long offset,
-  unsigned long *priv)
-{
-   struct inode *inode = OFNI_EDONI_2SFFJ(f);
-   struct page *pg;
-
-   pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
-(void *)jffs2_do_readpage_unlock, inode);
-   if (IS_ERR(pg))
-   return (void *)pg;
-
-   *priv = (unsigned long)pg;
-   return kmap(pg);
-}
-
-void jffs2_gc_release_page(struct jffs2_sb_info *c,
-  unsigned char *ptr,
-  unsigned long *priv)
-{
-   struct page *pg = (void *)*priv;
-
-   kunmap(pg);
-   page_cache_release(pg);
-}
-
 static int jffs2_flash_setup(struct jffs2_sb_info *c) {
int ret = 0;
 
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/gc.c 
linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c
--- linux-2.6.21-rc5-mm4/fs/jffs2/gc.c  2007-04-05 17:13:10.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c 2007-04-06 01:59:19.0 
-0700
@@ -1078,7 +1078,7 @@ static int jffs2_garbage_collect_dnode(s
uint32_t alloclen, offset, orig_end, orig_start;
int ret = 0;
unsigned char *comprbuf = NULL, *writebuf;
-   unsigned long pg;
+   struct page *page;
unsigned char *pg_ptr;
 
memset(, 0, sizeof(ri));
@@ -1219,12 +1219,16 @@ static int jffs2_garbage_collect_dnode(s
 *page OK. We'll actually write it out again in commit_write, which 
is a little
 *suboptimal, but at least we're correct.
 */
-   pg_ptr = jffs2_gc_fetch_page(c, f, start, );
+   page = read_cache_page(OFNI_EDONI_2SFFJ(f)->i_mapping,
+   start >> PAGE_CACHE_SHIFT,
+   (void *)jffs2_do_readpage_unlock,
+   OFNI_EDONI_2SFFJ(f));
 
-   if (IS_ERR(pg_ptr)) {
+   if (IS_ERR(page)) {
printk(KERN_WARNING "read_cache_page() returned error: %ld\n", 
PTR_ERR(pg_ptr));
-   return PTR_ERR(pg_ptr);
+   return PTR_ERR(page);
}
+   pg_ptr = kmap(page);
 
offset = start;
while(offset < orig_end) {
@@ -1287,6 +1291,7 @@ static int jffs2_garbage_collect_dnode(s
}
}
 
-   jffs2_gc_release_page(c, pg_ptr, );
+   kunmap(page);
+   page_cache_release(page);
return ret;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/17] ntfs: convert ntfs_map_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ntfs_map_page() and ntfs_unmap_page() using the new read_kmap_page()
and put_kmapped_page() calls, and their locking variants, and remove
unneeded PageError checking.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/aops.h 
linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h
--- linux-2.6.21-rc5-mm4/fs/ntfs/aops.h 2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h2007-04-06 01:59:19.0 
-0700
@@ -31,73 +31,6 @@
 
 #include "inode.h"
 
-/**
- * ntfs_unmap_page - release a page that was mapped using ntfs_map_page()
- * @page:  the page to release
- *
- * Unpin, unmap and release a page that was obtained from ntfs_map_page().
- */
-static inline void ntfs_unmap_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
-/**
- * ntfs_map_page - map a page into accessible memory, reading it if necessary
- * @mapping:   address space for which to obtain the page
- * @index: index into the page cache for @mapping of the page to map
- *
- * Read a page from the page cache of the address space @mapping at position
- * @index, where @index is in units of PAGE_CACHE_SIZE, and not in bytes.
- *
- * If the page is not in memory it is loaded from disk first using the readpage
- * method defined in the address space operations of @mapping and the page is
- * added to the page cache of @mapping in the process.
- *
- * If the page belongs to an mst protected attribute and it is marked as such
- * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no
- * error checking is performed.  This means the caller has to verify whether
- * the ntfs record(s) contained in the page are valid or not using one of the
- * ntfs_is__record{,p}() macros, where  is the record type you are
- * expecting to see.  (For details of the macros, see fs/ntfs/layout.h.)
- *
- * If the page is in high memory it is mapped into memory directly addressible
- * by the kernel.
- *
- * Finally the page count is incremented, thus pinning the page into place.
- *
- * The above means that page_address(page) can be used on all pages obtained
- * with ntfs_map_page() to get the kernel virtual address of the page.
- *
- * When finished with the page, the caller has to call ntfs_unmap_page() to
- * unpin, unmap and release the page.
- *
- * Note this does not grant exclusive access. If such is desired, the caller
- * must provide it independently of the ntfs_{un}map_page() calls by using
- * a {rw_}semaphore or other means of serialization. A spin lock cannot be
- * used as ntfs_map_page() can block.
- *
- * The unlocked and uptodate page is returned on success or an encoded error
- * on failure. Caller has to test for error using the IS_ERR() macro on the
- * return value. If that evaluates to 'true', the negative error code can be
- * obtained using PTR_ERR() on the return value of ntfs_map_page().
- */
-static inline struct page *ntfs_map_page(struct address_space *mapping,
-   unsigned long index)
-{
-   struct page *page = read_mapping_page(mapping, index, NULL);
-
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageError(page))
-   return page;
-   ntfs_unmap_page(page);
-   return ERR_PTR(-EIO);
-   }
-   return page;
-}
-
 #ifdef NTFS_RW
 
 extern void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs);
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c 
linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c
--- linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c   2006-11-29 13:57:37.0 
-0800
+++ linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c  2007-04-06 12:40:53.0 
-0700
@@ -72,7 +72,7 @@ int __ntfs_bitmap_set_bits_in_run(struct
 
/* Get the page containing the first bit (@start_bit). */
mapping = vi->i_mapping;
-   page = ntfs_map_page(mapping, index);
+   page = read_kmap_page(mapping, index);
if (IS_ERR(page)) {
if (!is_rollback)
ntfs_error(vi->i_sb, "Failed to map first page (error "
@@ -123,8 +123,8 @@ int __ntfs_bitmap_set_bits_in_run(struct
/* Update @index and get the next page. */
flush_dcache_page(page);
set_page_dirty(page);
-   ntfs_unmap_page(page);
-   page = ntfs_map_page(mapping, ++index);
+   put_kmapped_page(page);
+   page = read_kmap_page(mapping, ++index);
if (IS_ERR(page))
goto rollback;
kaddr = page_address(page);
@@ -159,7 +159,7 @@ done:
/* We are done.  Unmap the page and return success. */
flush_dcache_page(page);
set_page_dirty(page);
-   ntfs_unmap_page(page);
+   put_kmapped_page(page);
ntfs_debug("Done.");
return 0;
 rollback:
diff -urpN -X d

[PATCH 1/17] cramfs: use read_mapping_page

2007-04-11 Thread Nate Diller

read_mapping_page_async() is going away, so convert its only user to
read_mapping_page().  This change has not been benchmarked, however, in
order to get real parallelism this wants something completely different,
like __do_page_cache_readahead(), which is not currently exported.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/cramfs/inode.c 
linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c
--- linux-2.6.21-rc6-mm1/fs/cramfs/inode.c  2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c 2007-04-09 21:37:09.0 
-0700
@@ -180,8 +180,7 @@ static void *cramfs_read(struct super_bl
struct page *page = NULL;
 
if (blocknr + i < devsize) {
-   page = read_mapping_page_async(mapping, blocknr + i,
-   NULL);
+   page = read_mapping_page(mapping, blocknr + i, NULL);
/* synchronous error? */
if (IS_ERR(page))
page = NULL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/17] hfsplus: remove redundant read_mapping_page error check

2007-04-11 Thread Nate Diller

Now that read_mapping_page() does error checking internally, there is no
need to check PageError here.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]> 

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c 
linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c
--- linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c 2007-04-09 17:20:13.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c2007-04-10 
21:28:45.0 -0700
@@ -442,10 +442,6 @@ static struct hfs_bnode *__hfs_bnode_cre
page = read_mapping_page(mapping, block, NULL);
if (IS_ERR(page))
goto fail;
-   if (PageError(page)) {
-   page_cache_release(page);
-   goto fail;
-   }
page_cache_release(page);
node->page[i] = page;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/17] reiserfs: convert reiserfs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace reiserfs_get_page() and reiserfs_put_page() using the new
read_kmap_page() and put_kmapped_page() calls and their locking variants. 
Also, propagate the gfp_mask() deadlock comment to callsites.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c 
linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c
--- linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c2007-04-05 17:14:25.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c   2007-04-06 
14:41:34.0 -0700
@@ -438,33 +438,6 @@ int xattr_readdir(struct file *file, fil
return res;
 }
 
-/* Internal operations on file data */
-static inline void reiserfs_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
-static struct page *reiserfs_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page;
-   /* We can deadlock if we try to free dentries,
-  and an unlink/rmdir has just occured - GFP_NOFS avoids this */
-   mapping_set_gfp_mask(mapping, GFP_NOFS);
-   page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
-  fail:
-   reiserfs_put_page(page);
-   return ERR_PTR(-EIO);
-}
-
 static inline __u32 xattr_hash(const char *msg, int len)
 {
return csum_partial(msg, len, 0);
@@ -537,13 +510,15 @@ reiserfs_xattr_set(struct inode *inode, 
else
chunk = buffer_size - buffer_pos;
 
-   page = reiserfs_get_page(xinode, file_pos >> PAGE_CACHE_SHIFT);
+   /* We can deadlock if we try to free dentries,
+  and an unlink/rmdir has just occured - GFP_NOFS avoids this 
*/
+   mapping_set_gfp_mask(mapping, GFP_NOFS);
+   page = __read_kmap_page(mapping, file_pos >> PAGE_CACHE_SHIFT);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto out_filp;
}
 
-   lock_page(page);
data = page_address(page);
 
if (file_pos == 0) {
@@ -566,8 +541,7 @@ reiserfs_xattr_set(struct inode *inode, 
 page_offset + chunk +
 skip);
}
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
buffer_pos += chunk;
file_pos += chunk;
skip = 0;
@@ -646,13 +620,15 @@ reiserfs_xattr_get(const struct inode *i
else
chunk = isize - file_pos;
 
-   page = reiserfs_get_page(xinode, file_pos >> PAGE_CACHE_SHIFT);
+   /* We can deadlock if we try to free dentries,
+  and an unlink/rmdir has just occured - GFP_NOFS avoids this 
*/
+   mapping_set_gfp_mask(xinode->i_mapping, GFP_NOFS);
+   page = __read_kmap_page(xinode->i_mapping, file_pos >> 
PAGE_CACHE_SHIFT);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto out_dput;
}
 
-   lock_page(page);
data = page_address(page);
if (file_pos == 0) {
struct reiserfs_xattr_header *rxh =
@@ -661,8 +637,7 @@ reiserfs_xattr_get(const struct inode *i
chunk -= skip;
/* Magic doesn't match up.. */
if (rxh->h_magic != cpu_to_le32(REISERFS_XATTR_MAGIC)) {
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
reiserfs_warning(inode->i_sb,
 "Invalid magic for xattr (%s) "
 "associated with %k", name,
@@ -673,8 +648,7 @@ reiserfs_xattr_get(const struct inode *i
hash = le32_to_cpu(rxh->h_hash);
}
memcpy(buffer + buffer_pos, data + skip, chunk);
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
file_pos += chunk;
buffer_pos += chunk;
skip = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 17/17] vxfs: convert vxfs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace vxfs_get_page() with the new read_kmap_page().

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h  2007-04-05 
17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h 2007-04-06 
01:59:19.0 -0700
@@ -69,7 +69,6 @@ extern const struct file_operations   vxfs
 extern int vxfs_read_olt(struct super_block *, u_long);
 
 /* vxfs_subr.c */
-extern struct page *   vxfs_get_page(struct address_space *, u_long);
 extern voidvxfs_put_page(struct page *);
 extern struct buffer_head *vxfs_bread(struct inode *, int);
 
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c   2007-04-05 
17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c  2007-04-06 
01:59:19.0 -0700
@@ -138,7 +138,7 @@ __vxfs_iget(ino_t ino, struct inode *ili
u_long  offset;
 
offset = (ino % (PAGE_SIZE / VXFS_ISIZE)) * VXFS_ISIZE;
-   pp = vxfs_get_page(ilistp->i_mapping, ino * VXFS_ISIZE / PAGE_SIZE);
+   pp = read_kmap_page(ilistp->i_mapping, ino * VXFS_ISIZE / PAGE_SIZE);
 
if (!IS_ERR(pp)) {
struct vxfs_inode_info  *vip;
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c  2007-04-05 
17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c 2007-04-06 
01:59:19.0 -0700
@@ -125,7 +125,7 @@ vxfs_find_entry(struct inode *ip, struct
caddr_t kaddr;
struct page *pp;
 
-   pp = vxfs_get_page(ip->i_mapping, page);
+   pp = read_kmap_page(ip->i_mapping, page);
if (IS_ERR(pp))
continue;
kaddr = (caddr_t)page_address(pp);
@@ -280,7 +280,7 @@ vxfs_readdir(struct file *fp, void *retp
caddr_t kaddr;
struct page *pp;
 
-   pp = vxfs_get_page(ip->i_mapping, page);
+   pp = read_kmap_page(ip->i_mapping, page);
if (IS_ERR(pp))
continue;
kaddr = (caddr_t)page_address(pp);
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c2007-04-05 
17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c   2007-04-06 
01:59:19.0 -0700
@@ -56,39 +56,6 @@ vxfs_put_page(struct page *pp)
 }
 
 /**
- * vxfs_get_page - read a page into memory.
- * @ip:inode to read from
- * @n: page number
- *
- * Description:
- *   vxfs_get_page reads the @n th page of @ip into the pagecache.
- *
- * Returns:
- *   The wanted page on success, else a NULL pointer.
- */
-struct page *
-vxfs_get_page(struct address_space *mapping, u_long n)
-{
-   struct page *   pp;
-
-   pp = read_mapping_page(mapping, n, NULL);
-
-   if (!IS_ERR(pp)) {
-   kmap(pp);
-   /** if (!PageChecked(pp)) **/
-   /** vxfs_check_page(pp); **/
-   if (PageError(pp))
-   goto fail;
-   }
-   
-   return (pp);
-
-fail:
-   vxfs_put_page(pp);
-   return ERR_PTR(-EIO);
-}
-
-/**
  * vxfs_bread - read buffer for a give inode,block tuple
  * @ip:inode
  * @block: logical block
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/17] sysv: convert dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace sysv dir_get_page() with the new read_kmap_page().

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/sysv/dir.c 
linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c
--- linux-2.6.21-rc5-mm4/fs/sysv/dir.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c 2007-04-06 01:59:19.0 
-0700
@@ -50,15 +50,6 @@ static int dir_commit_chunk(struct page 
return err;
 }
 
-static struct page * dir_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page))
-   kmap(page);
-   return page;
-}
-
 static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
unsigned long pos = filp->f_pos;
@@ -77,7 +68,7 @@ static int sysv_readdir(struct file * fi
for ( ; n < npages; n++, offset = 0) {
char *kaddr, *limit;
struct sysv_dir_entry *de;
-   struct page *page = dir_get_page(inode, n);
+   struct page *page = read_kmap_page(inode->i_mapping, n);
 
if (IS_ERR(page))
continue;
@@ -149,7 +140,7 @@ struct sysv_dir_entry *sysv_find_entry(s
 
do {
char *kaddr;
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir->i_mapping, n);
if (!IS_ERR(page)) {
kaddr = (char*)page_address(page);
de = (struct sysv_dir_entry *) kaddr;
@@ -191,7 +182,7 @@ int sysv_add_link(struct dentry *dentry,
 
/* We take care of directory expansion in the same loop */
for (n = 0; n <= npages; n++) {
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir->i_mapping, n);
err = PTR_ERR(page);
if (IS_ERR(page))
goto out;
@@ -299,7 +290,7 @@ int sysv_empty_dir(struct inode * inode)
for (i = 0; i < npages; i++) {
char *kaddr;
struct sysv_dir_entry * de;
-   page = dir_get_page(inode, i);
+   page = read_kmap_page(inode->i_mapping, i);
 
if (IS_ERR(page))
continue;
@@ -353,7 +344,7 @@ void sysv_set_link(struct sysv_dir_entry
 
 struct sysv_dir_entry * sysv_dotdot (struct inode *dir, struct page **p)
 {
-   struct page *page = dir_get_page(dir, 0);
+   struct page *page = read_kmap_page(dir->i_mapping, 0);
struct sysv_dir_entry *de = NULL;
 
if (!IS_ERR(page)) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/17] ufs: convert ufs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ufs_get_page()/ufs_get_locked_page() and
ufs_put_page()/ufs_put_locked_page() using the new read_kmap_page() and
put_kmapped_page() calls and their locking variants.  Also, change the
ufs_check_page() call to return the page's error status, and update the
call sites accordingly.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/balloc.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c
--- linux-2.6.21-rc5-mm4/fs/ufs/balloc.c2007-04-05 17:13:29.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c   2007-04-06 12:46:02.0 
-0700
@@ -272,7 +272,7 @@ static void ufs_change_blocknr(struct in
index = i >> (PAGE_CACHE_SHIFT - inode->i_blkbits);
 
if (likely(cur_index != index)) {
-   page = ufs_get_locked_page(mapping, index);
+   page = __read_mapping_page(mapping, index, NULL);
if (!page)/* it was truncated */
continue;
if (IS_ERR(page)) {/* or EIO */
@@ -325,8 +325,10 @@ static void ufs_change_blocknr(struct in
bh = bh->b_this_page;
} while (bh != head);
 
-   if (likely(cur_index != index))
-   ufs_put_locked_page(page);
+   if (likely(cur_index != index)) {
+   unlock_page(page);
+   page_cache_release(page);
+   }
}
UFSD("EXIT\n");
 }
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/truncate.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c
--- linux-2.6.21-rc5-mm4/fs/ufs/truncate.c  2007-04-05 17:13:29.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c 2007-04-06 12:46:14.0 
-0700
@@ -395,8 +395,9 @@ static int ufs_alloc_lastblock(struct in
 
lastfrag--;
 
-   lastpage = ufs_get_locked_page(mapping, lastfrag >>
-  (PAGE_CACHE_SHIFT - inode->i_blkbits));
+   lastpage = __read_mapping_page(mapping, lastfrag >>
+  (PAGE_CACHE_SHIFT - inode->i_blkbits),
+  NULL);
if (IS_ERR(lastpage)) {
err = -EIO;
goto out;
@@ -441,7 +442,8 @@ static int ufs_alloc_lastblock(struct in
   }
}
 out_unlock:
-   ufs_put_locked_page(lastpage);
+   unlock_page(lastpage);
+   page_cache_release(lastpage);
 out:
return err;
 }
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/util.c
--- linux-2.6.21-rc5-mm4/fs/ufs/util.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.c 2007-04-06 12:40:53.0 
-0700
@@ -232,55 +232,3 @@ ufs_set_inode_dev(struct super_block *sb
ufsi->i_u1.i_data[0] = cpu_to_fs32(sb, fs32);
 }
 
-/**
- * ufs_get_locked_page() - locate, pin and lock a pagecache page, if not exist
- * read it from disk.
- * @mapping: the address_space to search
- * @index: the page index
- *
- * Locates the desired pagecache page, if not exist we'll read it,
- * locks it, increments its reference
- * count and returns its address.
- *
- */
-
-struct page *ufs_get_locked_page(struct address_space *mapping,
-pgoff_t index)
-{
-   struct page *page;
-
-   page = find_lock_page(mapping, index);
-   if (!page) {
-   page = read_mapping_page(mapping, index, NULL);
-
-   if (IS_ERR(page)) {
-   printk(KERN_ERR "ufs_change_blocknr: "
-  "read_mapping_page error: ino %lu, index: %lu\n",
-  mapping->host->i_ino, index);
-   goto out;
-   }
-
-   lock_page(page);
-
-   if (unlikely(page->mapping == NULL)) {
-   /* Truncate got there first */
-   unlock_page(page);
-   page_cache_release(page);
-   page = NULL;
-   goto out;
-   }
-
-   if (!PageUptodate(page) || PageError(page)) {
-   unlock_page(page);
-   page_cache_release(page);
-
-   printk(KERN_ERR "ufs_change_blocknr: "
-  "can not read page: ino %lu, index: %lu\n",
-  mapping->host->i_ino, index);
-
-   page = ERR_PTR(-EIO);
-   }
-   }
-out:
-   return page;
-}
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.h 
linux-2.6.21-rc5-mm4-test/fs/ufs/util.h
--- linux-2.6.21-rc5-mm4/fs/ufs/util.h  2007-04-05 17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.h 2007-04-06 12:46:36.0

[PATCH 13/17] reiser4: remove redundant read_mapping_page error checks

2007-04-11 Thread Nate Diller

read_mapping_page() is now fully synchronous, so there's no need wait for
the page lock or check for I/O errors.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c   
2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c  
2007-04-10 21:33:47.0 -0700
@@ -608,14 +608,6 @@ int extent2tail(unix_file_info_t *uf_inf
break;
}
 
-   wait_on_page_locked(page);
-
-   if (!PageUptodate(page)) {
-   page_cache_release(page);
-   result = RETERR(-EIO);
-   break;
-   }
-
/* cut part of file we have read */
start_byte = (__u64) (i << PAGE_CACHE_SHIFT);
set_key_offset(, start_byte);
diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c   
2007-04-10 19:41:14.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c  
2007-04-10 21:38:41.0 -0700
@@ -1220,15 +1220,8 @@ int reiser4_read_extent(struct file *fil
page = read_mapping_page(mapping, cur_page, file);
if (IS_ERR(page))
return PTR_ERR(page);
-   lock_page(page);
-   if (!PageUptodate(page)) {
-   unlock_page(page);
-   page_cache_release(page);
-   warning("jmacd-97178", "extent_read: page is not up to 
date");
-   return RETERR(-EIO);
-   }
+
mark_page_accessed(page);
-   unlock_page(page);
 
/* If users can be writing to this page using arbitrary virtual
   addresses, take care about potential aliasing before reading
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/17] partition: remove redundant read_mapping_page error checks

2007-04-11 Thread Nate Diller

Remove unneeded PageError checking in read_dev_sector(), and clean up the
code a bit.

Can anyone point out why it's OK to use page_address() here on a page which
has not been kmapped?  If it's not OK, then a good number of callers need to
be fixed.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/partitions/check.c 
linux-2.6.21-rc6-mm1-test/fs/partitions/check.c
--- linux-2.6.21-rc6-mm1/fs/partitions/check.c  2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/partitions/check.c 2007-04-10 
21:59:01.0 -0700
@@ -568,16 +568,12 @@ unsigned char *read_dev_sector(struct bl
 
page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_CACHE_SHIFT-9)),
 NULL);
-   if (!IS_ERR(page)) {
-   if (PageError(page))
-   goto fail;
-   p->v = page;
-   return (unsigned char *)page_address(page) +  ((n & ((1 << 
(PAGE_CACHE_SHIFT - 9)) - 1)) << 9);
-fail:
-   page_cache_release(page);
+   if (IS_ERR(page)) {
+   p->v = NULL;
+   return NULL;
}
-   p->v = NULL;
-   return NULL;
+   p->v = page;
+   return (unsigned char *)page_address(page) +  ((n & ((1 << 
(PAGE_CACHE_SHIFT - 9)) - 1)) << 9);
 }
 
 EXPORT_SYMBOL(read_dev_sector);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 9/17] minix: convert dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace minix dir_get_page() and dir_put_page() using the new
read_kmap_page() and put_kmapped_page()/put_locked_page() calls.  Also, use
__read_kmap_page() instead of re-taking the page_lock.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/minix/dir.c 
linux-2.6.21-rc5-mm4-test/fs/minix/dir.c
--- linux-2.6.21-rc5-mm4/fs/minix/dir.c 2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/minix/dir.c2007-04-06 02:31:55.0 
-0700
@@ -23,12 +23,6 @@ const struct file_operations minix_dir_o
.fsync  = minix_sync_file,
 };
 
-static inline void dir_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
 /*
  * Return the offset into page `page_nr' of the last valid
  * byte in that page, plus one.
@@ -60,22 +54,6 @@ static int dir_commit_chunk(struct page 
return err;
 }
 
-static struct page * dir_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageUptodate(page))
-   goto fail;
-   }
-   return page;
-
-fail:
-   dir_put_page(page);
-   return ERR_PTR(-EIO);
-}
-
 static inline void *minix_next_entry(void *de, struct minix_sb_info *sbi)
 {
return (void*)((char*)de + sbi->s_dirsize);
@@ -102,7 +80,7 @@ static int minix_readdir(struct file * f
 
for ( ; n < npages; n++, offset = 0) {
char *p, *kaddr, *limit;
-   struct page *page = dir_get_page(inode, n);
+   struct page *page = read_kmap_page(inode->i_mapping, n);
 
if (IS_ERR(page))
continue;
@@ -128,12 +106,12 @@ static int minix_readdir(struct file * f
(n << PAGE_CACHE_SHIFT) | offset,
inumber, DT_UNKNOWN);
if (over) {
-   dir_put_page(page);
+   put_kmapped_page(page);
goto done;
}
}
}
-   dir_put_page(page);
+   put_kmapped_page(page);
}
 
 done:
@@ -177,7 +155,7 @@ minix_dirent *minix_find_entry(struct de
for (n = 0; n < npages; n++) {
char *kaddr, *limit;
 
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir->i_mapping, n);
if (IS_ERR(page))
continue;
 
@@ -198,7 +176,7 @@ minix_dirent *minix_find_entry(struct de
if (namecompare(namelen, sbi->s_namelen, name, namx))
goto found;
}
-   dir_put_page(page);
+   put_kmapped_page(page);
}
return NULL;
 
@@ -233,11 +211,10 @@ int minix_add_link(struct dentry *dentry
for (n = 0; n <= npages; n++) {
char *limit, *dir_end;
 
-   page = dir_get_page(dir, n);
+   page = __read_kmap_page(dir->i_mapping, n);
err = PTR_ERR(page);
if (IS_ERR(page))
goto out;
-   lock_page(page);
kaddr = (char*)page_address(page);
dir_end = kaddr + minix_last_byte(dir, n);
limit = kaddr + PAGE_CACHE_SIZE - sbi->s_dirsize;
@@ -265,8 +242,7 @@ int minix_add_link(struct dentry *dentry
if (namecompare(namelen, sbi->s_namelen, name, namx))
goto out_unlock;
}
-   unlock_page(page);
-   dir_put_page(page);
+   put_locked_page(page);
}
BUG();
return -EINVAL;
@@ -288,13 +264,12 @@ got_it:
err = dir_commit_chunk(page, from, to);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
-out_put:
-   dir_put_page(page);
+   put_kmapped_page(page);
 out:
return err;
 out_unlock:
-   unlock_page(page);
-   goto out_put;
+   put_locked_page(page);
+   return err;
 }
 
 int minix_delete_entry(struct minix_dir_entry *de, struct page *page)
@@ -314,7 +289,7 @@ int minix_delete_entry(struct minix_dir_
} else {
unlock_page(page);
}
-   dir_put_page(page);
+   put_kmapped_page(page);
inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
return err;
@@ -378,7 +353,7 @@ int minix_empty_dir(struct inode * inode
for (i = 0; i < npages; i++) {
char *p, *kaddr, *limit;
 
-   page = dir_get_page(inode, i);
+   page = read_kma

[PATCH 10/17] mtd: convert page_read to read_kmap_page

2007-04-11 Thread Nate Diller

Replace page_read() with read_kmap_page()/__read_kmap_page().  This probably
fixes behaviour on highmem systems, since page_address() was being used
without kmap().  Also eliminate the need to re-take the page lock during
writes to the page.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c 
linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c
--- linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c2007-04-05 
17:14:24.0 -0700
+++ linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c   2007-04-06 
01:59:19.0 -0700
@@ -39,12 +39,6 @@ struct block2mtd_dev {
 /* Static info about the MTD, used in cleanup_module */
 static LIST_HEAD(blkmtd_device_list);
 
-
-static struct page *page_read(struct address_space *mapping, int index)
-{
-   return read_mapping_page(mapping, index, NULL);
-}
-
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
@@ -56,23 +50,19 @@ static int _block2mtd_erase(struct block
u_long *max;
 
while (pages) {
-   page = page_read(mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = __read_kmap_page(mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
max = page_address(page) + PAGE_SIZE;
for (p=page_address(page); pblkdev->bd_inode->i_mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = read_kmap_page(dev->blkdev->bd_inode->i_mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
memcpy(buf, page_address(page) + offset, cpylen);
-   page_cache_release(page);
+   put_kmapped_page(page);
 
if (retlen)
*retlen += cpylen;
@@ -163,19 +151,15 @@ static int _block2mtd_write(struct block
cpylen = len;   // this page
len = len - cpylen;
 
-   page = page_read(mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = __read_kmap_page(mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
if (memcmp(page_address(page)+offset, buf, cpylen)) {
-   lock_page(page);
memcpy(page_address(page) + offset, buf, cpylen);
set_page_dirty(page);
-   unlock_page(page);
}
-   page_cache_release(page);
+   put_locked_page(page);
 
if (retlen)
*retlen += cpylen;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/17] ext2: convert ext2_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ext2_get_page() and ext2_put_page() using the new read_kmap_page()
and put_kmapped_page() calls.  Also, change the ext2_check_page() call to
return the page's error status, and update the call sites accordingly.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ext2/dir.c 
linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c
--- linux-2.6.21-rc5-mm4/fs/ext2/dir.c  2007-04-06 12:27:03.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c 2007-04-06 14:34:23.0 
-0700
@@ -35,12 +35,6 @@ static inline unsigned ext2_chunk_size(s
return inode->i_sb->s_blocksize;
 }
 
-static inline void ext2_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
 static inline unsigned long dir_pages(struct inode *inode)
 {
return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT;
@@ -74,7 +68,7 @@ static int ext2_commit_chunk(struct page
return err;
 }
 
-static void ext2_check_page(struct page *page)
+static int ext2_check_page(struct page *page)
 {
struct inode *dir = page->mapping->host;
struct super_block *sb = dir->i_sb;
@@ -86,6 +80,14 @@ static void ext2_check_page(struct page 
ext2_dirent *p;
char *error;
 
+   if (likely(PageChecked(page))) {
+   if (likely(!PageError(page)))
+   return 0;
+
+   put_kmapped_page(page);
+   return -EIO;
+   }
+
if ((dir->i_size >> PAGE_CACHE_SHIFT) == page->index) {
limit = dir->i_size & ~PAGE_CACHE_MASK;
if (limit & (chunk_size - 1))
@@ -112,7 +114,7 @@ static void ext2_check_page(struct page 
goto Eend;
 out:
SetPageChecked(page);
-   return;
+   return 0;
 
/* Too bad, we had an error */
 
@@ -153,24 +155,8 @@ Eend:
 fail:
SetPageChecked(page);
SetPageError(page);
-}
-
-static struct page * ext2_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageChecked(page))
-   ext2_check_page(page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
-fail:
-   ext2_put_page(page);
-   return ERR_PTR(-EIO);
+   put_kmapped_page(page);
+   return -EIO;
 }
 
 /*
@@ -262,9 +248,9 @@ ext2_readdir (struct file * filp, void *
for ( ; n < npages; n++, offset = 0) {
char *kaddr, *limit;
ext2_dirent *de;
-   struct page *page = ext2_get_page(inode, n);
+   struct page *page = read_kmap_page(inode->i_mapping, n);
 
-   if (IS_ERR(page)) {
+   if (IS_ERR(page) || ext2_check_page(page)) {
ext2_error(sb, __FUNCTION__,
   "bad page in #%lu",
   inode->i_ino);
@@ -286,7 +272,7 @@ ext2_readdir (struct file * filp, void *
if (de->rec_len == 0) {
ext2_error(sb, __FUNCTION__,
"zero-length directory entry");
-   ext2_put_page(page);
+   put_kmapped_page(page);
return -EIO;
}
if (de->inode) {
@@ -301,13 +287,13 @@ ext2_readdir (struct file * filp, void *
(n<inode), d_type);
if (over) {
-   ext2_put_page(page);
+   put_kmapped_page(page);
return 0;
}
}
filp->f_pos += le16_to_cpu(de->rec_len);
}
-   ext2_put_page(page);
+   put_kmapped_page(page);
}
return 0;
 }
@@ -344,8 +330,8 @@ struct ext2_dir_entry_2 * ext2_find_entr
n = start;
do {
char *kaddr;
-   page = ext2_get_page(dir, n);
-   if (!IS_ERR(page)) {
+   page = read_kmap_page(dir->i_mapping, n);
+   if (!IS_ERR(page) && !ext2_check_page(page)) {
kaddr = page_address(page);
de = (ext2_dirent *) kaddr;
kaddr += ext2_last_byte(dir, n) - reclen;
@@ -353,14 +339,14 @@ struct ext2_dir_entry_2 * ext2_find_entr
if (de->rec_len == 0) {
ext2_error(dir->i_sb, __FUNCTION__,

[PATCH 8/17] jfs: use locking read_mapping_page

2007-04-11 Thread Nate Diller

Use the new locking variant of read_mapping_page to avoid doing extra work.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c 
linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c
--- linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c  2007-04-09 17:23:48.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c 2007-04-09 
21:37:09.0 -0700
@@ -632,12 +632,11 @@ struct metapage *__get_metapage(struct i
}
SetPageUptodate(page);
} else {
-   page = read_mapping_page(mapping, page_index, NULL);
-   if (IS_ERR(page) || !PageUptodate(page)) {
+   page = __read_mapping_page(mapping, page_index, NULL);
+   if (IS_ERR(page)) {
jfs_err("read_mapping_page failed!");
return NULL;
}
-   lock_page(page);
}
 
mp = page_to_mp(page, page_offset);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/17] fs: introduce new read_cache_page interface

2007-04-11 Thread Nate Diller

Export a single version of read_cache_page, which returns with a locked,
Uptodate page or a synchronous error, and use inline helper functions to
replicate the old behavior.  Also, introduce new helper functions for the
most common file system uses, which include kmapping the page, as well as
needing to keep the page locked.  These changes collectively eliminate a
substantial amount of private fs logic in favor of generic code.

It also simplifies filemap.c significantly, by assuming that callers want
synchronous behavior, which is true for all callers anyway except one.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/include/linux/pagemap.h 
linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h
--- linux-2.6.21-rc6-mm1/include/linux/pagemap.h2007-04-11 
14:22:19.0 -0700
+++ linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h   2007-04-11 
14:29:31.0 -0700
@@ -108,21 +108,30 @@ static inline struct page *grab_cache_pa
 
 extern struct page * grab_cache_page_nowait(struct address_space *mapping,
unsigned long index);
-extern struct page * read_cache_page_async(struct address_space *mapping,
-   unsigned long index, filler_t *filler,
-   void *data);
-extern struct page * read_cache_page(struct address_space *mapping,
+extern struct page *__read_cache_page(struct address_space *mapping,
unsigned long index, filler_t *filler,
void *data);
 extern int read_cache_pages(struct address_space *mapping,
struct list_head *pages, filler_t *filler, void *data);
 
-static inline struct page *read_mapping_page_async(
-   struct address_space *mapping,
+void fastcall unlock_page(struct page *page);
+static inline struct page *read_cache_page(struct address_space *mapping,
+   unsigned long index, filler_t *filler,
+   void *data)
+{
+   struct page *page;
+
+   page = __read_cache_page(mapping, index, filler, data);
+   if (!IS_ERR(page))
+   unlock_page(page);
+   return page;
+}
+
+static inline struct page *__read_mapping_page(struct address_space *mapping,
 unsigned long index, void *data)
 {
filler_t *filler = (filler_t *)mapping->a_ops->readpage;
-   return read_cache_page_async(mapping, index, filler, data);
+   return __read_cache_page(mapping, index, filler, data);
 }
 
 static inline struct page *read_mapping_page(struct address_space *mapping,
@@ -132,6 +141,36 @@ static inline struct page *read_mapping_
return read_cache_page(mapping, index, filler, data);
 }
 
+static inline struct page *__read_kmap_page(struct address_space *mapping,
+ unsigned long index)
+{
+   struct page *page = __read_mapping_page(mapping, index, NULL);
+   if (!IS_ERR(page))
+   kmap(page);
+   return page;
+}
+
+static inline struct page *read_kmap_page(struct address_space *mapping,
+ unsigned long index)
+{
+   struct page *page = read_mapping_page(mapping, index, NULL);
+   if (!IS_ERR(page))
+   kmap(page);
+   return page;
+}
+
+static inline void put_kmapped_page(struct page *page)
+{
+   kunmap(page);
+   page_cache_release(page);
+}
+
+static inline void put_locked_page(struct page *page)
+{
+   unlock_page(page);
+   put_kmapped_page(page);
+}
+
 int add_to_page_cache(struct page *page, struct address_space *mapping,
unsigned long index, gfp_t gfp_mask);
 int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/mm/filemap.c 
linux-2.6.21-rc6-mm1-test/mm/filemap.c
--- linux-2.6.21-rc6-mm1/mm/filemap.c   2007-04-11 14:26:42.0 -0700
+++ linux-2.6.21-rc6-mm1-test/mm/filemap.c  2007-04-10 21:46:03.0 
-0700
@@ -1600,115 +1600,53 @@ int generic_file_readonly_mmap(struct fi
 EXPORT_SYMBOL(generic_file_mmap);
 EXPORT_SYMBOL(generic_file_readonly_mmap);
 
-static struct page *__read_cache_page(struct address_space *mapping,
-   unsigned long index,
-   int (*filler)(void *,struct page*),
-   void *data)
-{
-   struct page *page, *cached_page = NULL;
-   int err;
-repeat:
-   page = find_get_page(mapping, index);
-   if (!page) {
-   if (!cached_page) {
-   cached_page = page_cache_alloc_cold(mapping);
-   if (!cached_page)
-   return ERR_PTR(-ENOMEM);
-   }
-   err = add_to_page_cache_lru(cached_page, mapping,
-

[PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace afs_dir_get_page() and afs_dir_put_page() using the new
read_kmap_page() and put_kmapped_page() calls, and eliminate unnecessary
PageError checks.  Also, change the afs_dir_check_page() call to return
the page's error status, and update the call site accordingly.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/afs/dir.c 
linux-2.6.21-rc5-mm4-test/fs/afs/dir.c
--- linux-2.6.21-rc5-mm4/fs/afs/dir.c   2007-04-06 12:27:03.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/afs/dir.c  2007-04-06 14:30:22.0 
-0700
@@ -115,12 +115,15 @@ struct afs_dir_lookup_cookie {
 /*
  * check that a directory page is valid
  */
-static inline void afs_dir_check_page(struct inode *dir, struct page *page)
+static inline int afs_dir_check_page(struct inode *dir, struct page *page)
 {
struct afs_dir_page *dbuf;
loff_t latter;
int tmp, qty;
 
+   if (likely(PageChecked(page)))
+   return PageError(page);
+
 #if 0
/* check the page count */
qty = desc.size / sizeof(dbuf->blocks[0]);
@@ -154,52 +157,16 @@ static inline void afs_dir_check_page(st
}
 
SetPageChecked(page);
-   return;
+   return 0;
 
  error:
SetPageChecked(page);
SetPageError(page);
-
+   return 1;
 } /* end afs_dir_check_page() */
 
 /*/
 /*
- * discard a page cached in the pagecache
- */
-static inline void afs_dir_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-
-} /* end afs_dir_put_page() */
-
-/*/
-/*
- * get a page into the pagecache
- */
-static struct page *afs_dir_get_page(struct inode *dir, unsigned long index)
-{
-   struct page *page;
-
-   _enter("{%lu},%lu", dir->i_ino, index);
-
-   page = read_mapping_page(dir->i_mapping, index, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageChecked(page))
-   afs_dir_check_page(dir, page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
- fail:
-   afs_dir_put_page(page);
-   return ERR_PTR(-EIO);
-} /* end afs_dir_get_page() */
-
-/*/
-/*
  * open an AFS directory file
  */
 static int afs_dir_open(struct inode *inode, struct file *file)
@@ -344,11 +311,16 @@ static int afs_dir_iterate(struct inode 
blkoff = *fpos & ~(sizeof(union afs_dir_block) - 1);
 
/* fetch the appropriate page from the directory */
-   page = afs_dir_get_page(dir, blkoff / PAGE_SIZE);
+   page = read_kmap_page(dir->i_mapping, blkoff / PAGE_SIZE);
if (IS_ERR(page)) {
ret = PTR_ERR(page);
break;
}
+   if (afs_check_page(dir, page)) {
+   err = -EIO;
+   put_kmapped_page(page);
+   break;
+   }
 
limit = blkoff & ~(PAGE_SIZE - 1);
 
@@ -361,7 +333,7 @@ static int afs_dir_iterate(struct inode 
ret = afs_dir_iterate_block(fpos, dblock, blkoff,
cookie, filldir);
if (ret != 1) {
-   afs_dir_put_page(page);
+   put_kmapped_page(page);
goto out;
}
 
@@ -369,7 +341,7 @@ static int afs_dir_iterate(struct inode 
 
} while (*fpos < dir->i_size && blkoff < limit);
 
-   afs_dir_put_page(page);
+   put_kmapped_page(page);
ret = 0;
}
 
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/afs/mntpt.c 
linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c
--- linux-2.6.21-rc6-mm1/fs/afs/mntpt.c 2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c2007-04-10 21:22:07.0 
-0700
@@ -74,11 +74,6 @@ int afs_mntpt_check_symlink(struct afs_v
ret = PTR_ERR(page);
goto out;
}
-
-   ret = -EIO;
-   if (PageError(page))
-   goto out_free;
-
buf = kmap(page);
 
/* examine the symlink's contents */
@@ -98,7 +93,6 @@ int afs_mntpt_check_symlink(struct afs_v
ret = 0;
 
kunmap(page);
- out_free:
page_cache_release(page);
  out:
_leave(" = %d", ret);
@@ -180,10 +174,6 @@ static struct vfsmount *afs_mntpt_do_aut
goto error;
}
 
-   ret = -EIO;
-   if (PageError(page))
-   goto error;
-
buf = kmap(page);
memcpy(devname, buf, size);
kunmap(page);
-
To unsu

[PATCH 0/17] fs: cleanup single page synchronous read interface

2007-04-11 Thread Nate Diller

Nick Piggin recently changed the read_cache_page interface to be
synchronous, which is pretty much what the file systems want anyway.  Turns
out that they have more in common than that, though, and some of them want
to be able to get an uptodate *locked* page.  Many of them want a kmapped
page, which is uptodate and unlocked, and they all have their own individual
helper functions to achieve this.

Since the helper functions are so similar, this patch just combines them
into a small number of simple library functions, which call read_cache_page
(renamed to __read_cache_page because it now returns a locked page).  The
immediate result is a vast reduction in the number of fs-specific helper
functions.  The secondary goal is to reduce the number of places the page
lock is taken, and eliminate a lot of PageUptodate and PageError checks.

The file systems that still use PageChecked now have checker functions that
return an error if the page is corrupted or has some other error.  This
simplifies the logic since the checker function is not part of any helper
function anymore.

Compile tested on x86_64.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

 drivers/mtd/devices/block2mtd.c  |   28 +--
 fs/afs/dir.c |   56 +++---
 fs/afs/mntpt.c   |   10 --
 fs/cramfs/inode.c|3 
 fs/ext2/dir.c|   82 -
 fs/freevxfs/vxfs_extern.h|1 
 fs/freevxfs/vxfs_inode.c |2 
 fs/freevxfs/vxfs_lookup.c|4 -
 fs/freevxfs/vxfs_subr.c  |   33 
 fs/hfs/bnode.c   |4 -
 fs/hfsplus/bnode.c   |4 -
 fs/jffs2/fs.c|   27 ---
 fs/jffs2/gc.c|   15 ++-
 fs/jfs/jfs_metapage.c|5 -
 fs/minix/dir.c   |   59 ---
 fs/ntfs/aops.h   |   67 -
 fs/ntfs/bitmap.c |8 +-
 fs/ntfs/dir.c|   65 ++---
 fs/ntfs/index.c  |   12 +--
 fs/ntfs/lcnalloc.c   |6 -
 fs/ntfs/logfile.c|   12 +--
 fs/ntfs/mft.c|   53 +
 fs/ntfs/super.c  |   38 -
 fs/ntfs/usnjrnl.c|4 -
 fs/partitions/check.c|   14 +--
 fs/reiser4/plugin/file/tail_conversion.c |8 --
 fs/reiser4/plugin/item/extent_file_ops.c |9 --
 fs/reiserfs/xattr.c  |   48 ++--
 fs/sysv/dir.c|   19 +---
 fs/ufs/balloc.c  |8 +-
 fs/ufs/dir.c |   90 +--
 fs/ufs/truncate.c|8 +-
 fs/ufs/util.c|   52 -
 fs/ufs/util.h|   10 --
 include/linux/pagemap.h  |   53 -
 mm/filemap.c |  118 +++
 36 files changed, 315 insertions(+), 720 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/13] fs: convert core functions to zero_user_page

2007-04-11 Thread Nate Diller


On 4/10/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Tue, 10 Apr 2007 20:36:00 -0700 Nate Diller <[EMAIL PROTECTED]> wrote:

> It's very common for file systems to need to zero part or all of a page, the
> simplist way is just to use kmap_atomic() and memset().  There's actually a
> library function in include/linux/highmem.h that does exactly that, but it's
> confusingly named memclear_highpage_flush(), which is descriptive of *how*
> it does the work rather than what the *purpose* is.  So this patchset
> renames the function to zero_user_page(), and calls it from the various
> places that currently open code it.
>
> This first patch introduces the new function call, and converts all the core
> kernel callsites, both the open-coded ones and the old
> memclear_highpage_flush() ones.  Following this patch is a series of
> conversions for each file system individually, per AKPM, and finally a patch
> deprecating the old call.

For the reasons Anton identified, I think it is better design while we're here
to force callers to pass in the kmap-type which they wish to use for the atomic
kmap.  It makes the programmer think about what he wants to happen.  The price
of getting this wrong tends to be revoltingly rare file corruption.


yeah, I actually agree with you, on thinking about it.  Thanks for
doing the conversion :)


But we cannot make this change in the obvious fashion, because the KM_FOO
identifiers are undefined if CONFIG_HIGHMEM=n.  So

zero_user_page(page, 1, 2, KM_USER0);

won't compile on non-highmem.

So we are forced to use a macro, like below.

Also, you forgot to mark memclear_highpage_flush() __deprecated.


that follows in a later patch ... for some reason I had trouble
compiling using your notation, and i had to add a function prototype
with the __deprecated flag. shrug.



And I'm surprised that this:

+static inline void memclear_highpage_flush(struct page *page, unsigned int 
offset, unsigned int size)
+{
+   return zero_user_page(page, offset, size);
+}

compiled.  zero_user_page() returns void...


it's funny, it didn't even warn about it.  also it seems your version
below is incomplete ... shouldn't it read:

+static inline void memclear_highpage_flush(struct page *page,
+   unsigned int offset, unsigned int size) __deprecated
{
-   return zero_user_page(page, offset, size);
+   zero_user_page(page, offset, size, KM_USER0);
}

NATE



 drivers/block/loop.c|2 +-
 fs/buffer.c |   21 -
 fs/direct-io.c  |2 +-
 fs/mpage.c  |6 --
 include/linux/highmem.h |   29 +
 mm/filemap_xip.c|2 +-
 6 files changed, 36 insertions(+), 26 deletions(-)

diff -puN 
drivers/block/loop.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type 
drivers/block/loop.c
--- 
a/drivers/block/loop.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type
+++ a/drivers/block/loop.c
@@ -250,7 +250,7 @@ static int do_lo_send_aops(struct loop_d
 */
printk(KERN_ERR "loop: transfer error block %llu\n",
   (unsigned long long)index);
-   zero_user_page(page, offset, size);
+   zero_user_page(page, offset, size, KM_USER0);
}
flush_dcache_page(page);
ret = aops->commit_write(file, page, offset,
diff -puN 
fs/buffer.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type 
fs/buffer.c
--- a/fs/buffer.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type
+++ a/fs/buffer.c
@@ -1855,7 +1855,7 @@ static int __block_prepare_write(struct
break;
if (buffer_new(bh)) {
clear_buffer_new(bh);
-   zero_user_page(page, block_start, bh->b_size);
+   zero_user_page(page, block_start, bh->b_size, KM_USER0);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
}
@@ -1943,7 +1943,8 @@ int block_read_full_page(struct page *pa
SetPageError(page);
}
if (!buffer_mapped(bh)) {
-   zero_user_page(page, i * blocksize, blocksize);
+   zero_user_page(page, i * blocksize, blocksize,
+   KM_USER0);
if (!err)
set_buffer_uptodate(bh);
continue;
@@ -2107,7 +2108,8 @@ int cont_prepare_write(struct page *page
PAGE_CACHE_SIZE, get_block);
if (status)
goto out_unmap;
-   zero_user_page(page, zerofrom, PA

Re: [PATCH 1/13] fs: convert core functions to zero_user_page

2007-04-11 Thread Nate Diller


On 4/10/07, Andrew Morton [EMAIL PROTECTED] wrote:

On Tue, 10 Apr 2007 20:36:00 -0700 Nate Diller [EMAIL PROTECTED] wrote:

 It's very common for file systems to need to zero part or all of a page, the
 simplist way is just to use kmap_atomic() and memset().  There's actually a
 library function in include/linux/highmem.h that does exactly that, but it's
 confusingly named memclear_highpage_flush(), which is descriptive of *how*
 it does the work rather than what the *purpose* is.  So this patchset
 renames the function to zero_user_page(), and calls it from the various
 places that currently open code it.

 This first patch introduces the new function call, and converts all the core
 kernel callsites, both the open-coded ones and the old
 memclear_highpage_flush() ones.  Following this patch is a series of
 conversions for each file system individually, per AKPM, and finally a patch
 deprecating the old call.

For the reasons Anton identified, I think it is better design while we're here
to force callers to pass in the kmap-type which they wish to use for the atomic
kmap.  It makes the programmer think about what he wants to happen.  The price
of getting this wrong tends to be revoltingly rare file corruption.


yeah, I actually agree with you, on thinking about it.  Thanks for
doing the conversion :)


But we cannot make this change in the obvious fashion, because the KM_FOO
identifiers are undefined if CONFIG_HIGHMEM=n.  So

zero_user_page(page, 1, 2, KM_USER0);

won't compile on non-highmem.

So we are forced to use a macro, like below.

Also, you forgot to mark memclear_highpage_flush() __deprecated.


that follows in a later patch ... for some reason I had trouble
compiling using your notation, and i had to add a function prototype
with the __deprecated flag. shrug.



And I'm surprised that this:

+static inline void memclear_highpage_flush(struct page *page, unsigned int 
offset, unsigned int size)
+{
+   return zero_user_page(page, offset, size);
+}

compiled.  zero_user_page() returns void...


it's funny, it didn't even warn about it.  also it seems your version
below is incomplete ... shouldn't it read:

+static inline void memclear_highpage_flush(struct page *page,
+   unsigned int offset, unsigned int size) __deprecated
{
-   return zero_user_page(page, offset, size);
+   zero_user_page(page, offset, size, KM_USER0);
}

NATE



 drivers/block/loop.c|2 +-
 fs/buffer.c |   21 -
 fs/direct-io.c  |2 +-
 fs/mpage.c  |6 --
 include/linux/highmem.h |   29 +
 mm/filemap_xip.c|2 +-
 6 files changed, 36 insertions(+), 26 deletions(-)

diff -puN 
drivers/block/loop.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type 
drivers/block/loop.c
--- 
a/drivers/block/loop.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type
+++ a/drivers/block/loop.c
@@ -250,7 +250,7 @@ static int do_lo_send_aops(struct loop_d
 */
printk(KERN_ERR loop: transfer error block %llu\n,
   (unsigned long long)index);
-   zero_user_page(page, offset, size);
+   zero_user_page(page, offset, size, KM_USER0);
}
flush_dcache_page(page);
ret = aops-commit_write(file, page, offset,
diff -puN 
fs/buffer.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type 
fs/buffer.c
--- a/fs/buffer.c~fs-convert-core-functions-to-zero_user_page-pass-kmap-type
+++ a/fs/buffer.c
@@ -1855,7 +1855,7 @@ static int __block_prepare_write(struct
break;
if (buffer_new(bh)) {
clear_buffer_new(bh);
-   zero_user_page(page, block_start, bh-b_size);
+   zero_user_page(page, block_start, bh-b_size, KM_USER0);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
}
@@ -1943,7 +1943,8 @@ int block_read_full_page(struct page *pa
SetPageError(page);
}
if (!buffer_mapped(bh)) {
-   zero_user_page(page, i * blocksize, blocksize);
+   zero_user_page(page, i * blocksize, blocksize,
+   KM_USER0);
if (!err)
set_buffer_uptodate(bh);
continue;
@@ -2107,7 +2108,8 @@ int cont_prepare_write(struct page *page
PAGE_CACHE_SIZE, get_block);
if (status)
goto out_unmap;
-   zero_user_page(page, zerofrom, PAGE_CACHE_SIZE-zerofrom);
+   zero_user_page(page, zerofrom, PAGE_CACHE_SIZE - zerofrom

[PATCH 0/17] fs: cleanup single page synchronous read interface

2007-04-11 Thread Nate Diller

Nick Piggin recently changed the read_cache_page interface to be
synchronous, which is pretty much what the file systems want anyway.  Turns
out that they have more in common than that, though, and some of them want
to be able to get an uptodate *locked* page.  Many of them want a kmapped
page, which is uptodate and unlocked, and they all have their own individual
helper functions to achieve this.

Since the helper functions are so similar, this patch just combines them
into a small number of simple library functions, which call read_cache_page
(renamed to __read_cache_page because it now returns a locked page).  The
immediate result is a vast reduction in the number of fs-specific helper
functions.  The secondary goal is to reduce the number of places the page
lock is taken, and eliminate a lot of PageUptodate and PageError checks.

The file systems that still use PageChecked now have checker functions that
return an error if the page is corrupted or has some other error.  This
simplifies the logic since the checker function is not part of any helper
function anymore.

Compile tested on x86_64.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

 drivers/mtd/devices/block2mtd.c  |   28 +--
 fs/afs/dir.c |   56 +++---
 fs/afs/mntpt.c   |   10 --
 fs/cramfs/inode.c|3 
 fs/ext2/dir.c|   82 -
 fs/freevxfs/vxfs_extern.h|1 
 fs/freevxfs/vxfs_inode.c |2 
 fs/freevxfs/vxfs_lookup.c|4 -
 fs/freevxfs/vxfs_subr.c  |   33 
 fs/hfs/bnode.c   |4 -
 fs/hfsplus/bnode.c   |4 -
 fs/jffs2/fs.c|   27 ---
 fs/jffs2/gc.c|   15 ++-
 fs/jfs/jfs_metapage.c|5 -
 fs/minix/dir.c   |   59 ---
 fs/ntfs/aops.h   |   67 -
 fs/ntfs/bitmap.c |8 +-
 fs/ntfs/dir.c|   65 ++---
 fs/ntfs/index.c  |   12 +--
 fs/ntfs/lcnalloc.c   |6 -
 fs/ntfs/logfile.c|   12 +--
 fs/ntfs/mft.c|   53 +
 fs/ntfs/super.c  |   38 -
 fs/ntfs/usnjrnl.c|4 -
 fs/partitions/check.c|   14 +--
 fs/reiser4/plugin/file/tail_conversion.c |8 --
 fs/reiser4/plugin/item/extent_file_ops.c |9 --
 fs/reiserfs/xattr.c  |   48 ++--
 fs/sysv/dir.c|   19 +---
 fs/ufs/balloc.c  |8 +-
 fs/ufs/dir.c |   90 +--
 fs/ufs/truncate.c|8 +-
 fs/ufs/util.c|   52 -
 fs/ufs/util.h|   10 --
 include/linux/pagemap.h  |   53 -
 mm/filemap.c |  118 +++
 36 files changed, 315 insertions(+), 720 deletions(-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/17] fs: introduce new read_cache_page interface

2007-04-11 Thread Nate Diller

Export a single version of read_cache_page, which returns with a locked,
Uptodate page or a synchronous error, and use inline helper functions to
replicate the old behavior.  Also, introduce new helper functions for the
most common file system uses, which include kmapping the page, as well as
needing to keep the page locked.  These changes collectively eliminate a
substantial amount of private fs logic in favor of generic code.

It also simplifies filemap.c significantly, by assuming that callers want
synchronous behavior, which is true for all callers anyway except one.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/include/linux/pagemap.h 
linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h
--- linux-2.6.21-rc6-mm1/include/linux/pagemap.h2007-04-11 
14:22:19.0 -0700
+++ linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h   2007-04-11 
14:29:31.0 -0700
@@ -108,21 +108,30 @@ static inline struct page *grab_cache_pa
 
 extern struct page * grab_cache_page_nowait(struct address_space *mapping,
unsigned long index);
-extern struct page * read_cache_page_async(struct address_space *mapping,
-   unsigned long index, filler_t *filler,
-   void *data);
-extern struct page * read_cache_page(struct address_space *mapping,
+extern struct page *__read_cache_page(struct address_space *mapping,
unsigned long index, filler_t *filler,
void *data);
 extern int read_cache_pages(struct address_space *mapping,
struct list_head *pages, filler_t *filler, void *data);
 
-static inline struct page *read_mapping_page_async(
-   struct address_space *mapping,
+void fastcall unlock_page(struct page *page);
+static inline struct page *read_cache_page(struct address_space *mapping,
+   unsigned long index, filler_t *filler,
+   void *data)
+{
+   struct page *page;
+
+   page = __read_cache_page(mapping, index, filler, data);
+   if (!IS_ERR(page))
+   unlock_page(page);
+   return page;
+}
+
+static inline struct page *__read_mapping_page(struct address_space *mapping,
 unsigned long index, void *data)
 {
filler_t *filler = (filler_t *)mapping-a_ops-readpage;
-   return read_cache_page_async(mapping, index, filler, data);
+   return __read_cache_page(mapping, index, filler, data);
 }
 
 static inline struct page *read_mapping_page(struct address_space *mapping,
@@ -132,6 +141,36 @@ static inline struct page *read_mapping_
return read_cache_page(mapping, index, filler, data);
 }
 
+static inline struct page *__read_kmap_page(struct address_space *mapping,
+ unsigned long index)
+{
+   struct page *page = __read_mapping_page(mapping, index, NULL);
+   if (!IS_ERR(page))
+   kmap(page);
+   return page;
+}
+
+static inline struct page *read_kmap_page(struct address_space *mapping,
+ unsigned long index)
+{
+   struct page *page = read_mapping_page(mapping, index, NULL);
+   if (!IS_ERR(page))
+   kmap(page);
+   return page;
+}
+
+static inline void put_kmapped_page(struct page *page)
+{
+   kunmap(page);
+   page_cache_release(page);
+}
+
+static inline void put_locked_page(struct page *page)
+{
+   unlock_page(page);
+   put_kmapped_page(page);
+}
+
 int add_to_page_cache(struct page *page, struct address_space *mapping,
unsigned long index, gfp_t gfp_mask);
 int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/mm/filemap.c 
linux-2.6.21-rc6-mm1-test/mm/filemap.c
--- linux-2.6.21-rc6-mm1/mm/filemap.c   2007-04-11 14:26:42.0 -0700
+++ linux-2.6.21-rc6-mm1-test/mm/filemap.c  2007-04-10 21:46:03.0 
-0700
@@ -1600,115 +1600,53 @@ int generic_file_readonly_mmap(struct fi
 EXPORT_SYMBOL(generic_file_mmap);
 EXPORT_SYMBOL(generic_file_readonly_mmap);
 
-static struct page *__read_cache_page(struct address_space *mapping,
-   unsigned long index,
-   int (*filler)(void *,struct page*),
-   void *data)
-{
-   struct page *page, *cached_page = NULL;
-   int err;
-repeat:
-   page = find_get_page(mapping, index);
-   if (!page) {
-   if (!cached_page) {
-   cached_page = page_cache_alloc_cold(mapping);
-   if (!cached_page)
-   return ERR_PTR(-ENOMEM);
-   }
-   err = add_to_page_cache_lru(cached_page, mapping,
-   index

[PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace afs_dir_get_page() and afs_dir_put_page() using the new
read_kmap_page() and put_kmapped_page() calls, and eliminate unnecessary
PageError checks.  Also, change the afs_dir_check_page() call to return
the page's error status, and update the call site accordingly.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/afs/dir.c 
linux-2.6.21-rc5-mm4-test/fs/afs/dir.c
--- linux-2.6.21-rc5-mm4/fs/afs/dir.c   2007-04-06 12:27:03.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/afs/dir.c  2007-04-06 14:30:22.0 
-0700
@@ -115,12 +115,15 @@ struct afs_dir_lookup_cookie {
 /*
  * check that a directory page is valid
  */
-static inline void afs_dir_check_page(struct inode *dir, struct page *page)
+static inline int afs_dir_check_page(struct inode *dir, struct page *page)
 {
struct afs_dir_page *dbuf;
loff_t latter;
int tmp, qty;
 
+   if (likely(PageChecked(page)))
+   return PageError(page);
+
 #if 0
/* check the page count */
qty = desc.size / sizeof(dbuf-blocks[0]);
@@ -154,52 +157,16 @@ static inline void afs_dir_check_page(st
}
 
SetPageChecked(page);
-   return;
+   return 0;
 
  error:
SetPageChecked(page);
SetPageError(page);
-
+   return 1;
 } /* end afs_dir_check_page() */
 
 /*/
 /*
- * discard a page cached in the pagecache
- */
-static inline void afs_dir_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-
-} /* end afs_dir_put_page() */
-
-/*/
-/*
- * get a page into the pagecache
- */
-static struct page *afs_dir_get_page(struct inode *dir, unsigned long index)
-{
-   struct page *page;
-
-   _enter({%lu},%lu, dir-i_ino, index);
-
-   page = read_mapping_page(dir-i_mapping, index, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageChecked(page))
-   afs_dir_check_page(dir, page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
- fail:
-   afs_dir_put_page(page);
-   return ERR_PTR(-EIO);
-} /* end afs_dir_get_page() */
-
-/*/
-/*
  * open an AFS directory file
  */
 static int afs_dir_open(struct inode *inode, struct file *file)
@@ -344,11 +311,16 @@ static int afs_dir_iterate(struct inode 
blkoff = *fpos  ~(sizeof(union afs_dir_block) - 1);
 
/* fetch the appropriate page from the directory */
-   page = afs_dir_get_page(dir, blkoff / PAGE_SIZE);
+   page = read_kmap_page(dir-i_mapping, blkoff / PAGE_SIZE);
if (IS_ERR(page)) {
ret = PTR_ERR(page);
break;
}
+   if (afs_check_page(dir, page)) {
+   err = -EIO;
+   put_kmapped_page(page);
+   break;
+   }
 
limit = blkoff  ~(PAGE_SIZE - 1);
 
@@ -361,7 +333,7 @@ static int afs_dir_iterate(struct inode 
ret = afs_dir_iterate_block(fpos, dblock, blkoff,
cookie, filldir);
if (ret != 1) {
-   afs_dir_put_page(page);
+   put_kmapped_page(page);
goto out;
}
 
@@ -369,7 +341,7 @@ static int afs_dir_iterate(struct inode 
 
} while (*fpos  dir-i_size  blkoff  limit);
 
-   afs_dir_put_page(page);
+   put_kmapped_page(page);
ret = 0;
}
 
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/afs/mntpt.c 
linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c
--- linux-2.6.21-rc6-mm1/fs/afs/mntpt.c 2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c2007-04-10 21:22:07.0 
-0700
@@ -74,11 +74,6 @@ int afs_mntpt_check_symlink(struct afs_v
ret = PTR_ERR(page);
goto out;
}
-
-   ret = -EIO;
-   if (PageError(page))
-   goto out_free;
-
buf = kmap(page);
 
/* examine the symlink's contents */
@@ -98,7 +93,6 @@ int afs_mntpt_check_symlink(struct afs_v
ret = 0;
 
kunmap(page);
- out_free:
page_cache_release(page);
  out:
_leave( = %d, ret);
@@ -180,10 +174,6 @@ static struct vfsmount *afs_mntpt_do_aut
goto error;
}
 
-   ret = -EIO;
-   if (PageError(page))
-   goto error;
-
buf = kmap(page);
memcpy(devname, buf, size);
kunmap(page);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body

[PATCH 4/17] ext2: convert ext2_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ext2_get_page() and ext2_put_page() using the new read_kmap_page()
and put_kmapped_page() calls.  Also, change the ext2_check_page() call to
return the page's error status, and update the call sites accordingly.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ext2/dir.c 
linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c
--- linux-2.6.21-rc5-mm4/fs/ext2/dir.c  2007-04-06 12:27:03.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c 2007-04-06 14:34:23.0 
-0700
@@ -35,12 +35,6 @@ static inline unsigned ext2_chunk_size(s
return inode-i_sb-s_blocksize;
 }
 
-static inline void ext2_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
 static inline unsigned long dir_pages(struct inode *inode)
 {
return (inode-i_size+PAGE_CACHE_SIZE-1)PAGE_CACHE_SHIFT;
@@ -74,7 +68,7 @@ static int ext2_commit_chunk(struct page
return err;
 }
 
-static void ext2_check_page(struct page *page)
+static int ext2_check_page(struct page *page)
 {
struct inode *dir = page-mapping-host;
struct super_block *sb = dir-i_sb;
@@ -86,6 +80,14 @@ static void ext2_check_page(struct page 
ext2_dirent *p;
char *error;
 
+   if (likely(PageChecked(page))) {
+   if (likely(!PageError(page)))
+   return 0;
+
+   put_kmapped_page(page);
+   return -EIO;
+   }
+
if ((dir-i_size  PAGE_CACHE_SHIFT) == page-index) {
limit = dir-i_size  ~PAGE_CACHE_MASK;
if (limit  (chunk_size - 1))
@@ -112,7 +114,7 @@ static void ext2_check_page(struct page 
goto Eend;
 out:
SetPageChecked(page);
-   return;
+   return 0;
 
/* Too bad, we had an error */
 
@@ -153,24 +155,8 @@ Eend:
 fail:
SetPageChecked(page);
SetPageError(page);
-}
-
-static struct page * ext2_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir-i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageChecked(page))
-   ext2_check_page(page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
-fail:
-   ext2_put_page(page);
-   return ERR_PTR(-EIO);
+   put_kmapped_page(page);
+   return -EIO;
 }
 
 /*
@@ -262,9 +248,9 @@ ext2_readdir (struct file * filp, void *
for ( ; n  npages; n++, offset = 0) {
char *kaddr, *limit;
ext2_dirent *de;
-   struct page *page = ext2_get_page(inode, n);
+   struct page *page = read_kmap_page(inode-i_mapping, n);
 
-   if (IS_ERR(page)) {
+   if (IS_ERR(page) || ext2_check_page(page)) {
ext2_error(sb, __FUNCTION__,
   bad page in #%lu,
   inode-i_ino);
@@ -286,7 +272,7 @@ ext2_readdir (struct file * filp, void *
if (de-rec_len == 0) {
ext2_error(sb, __FUNCTION__,
zero-length directory entry);
-   ext2_put_page(page);
+   put_kmapped_page(page);
return -EIO;
}
if (de-inode) {
@@ -301,13 +287,13 @@ ext2_readdir (struct file * filp, void *
(nPAGE_CACHE_SHIFT) | offset,
le32_to_cpu(de-inode), d_type);
if (over) {
-   ext2_put_page(page);
+   put_kmapped_page(page);
return 0;
}
}
filp-f_pos += le16_to_cpu(de-rec_len);
}
-   ext2_put_page(page);
+   put_kmapped_page(page);
}
return 0;
 }
@@ -344,8 +330,8 @@ struct ext2_dir_entry_2 * ext2_find_entr
n = start;
do {
char *kaddr;
-   page = ext2_get_page(dir, n);
-   if (!IS_ERR(page)) {
+   page = read_kmap_page(dir-i_mapping, n);
+   if (!IS_ERR(page)  !ext2_check_page(page)) {
kaddr = page_address(page);
de = (ext2_dirent *) kaddr;
kaddr += ext2_last_byte(dir, n) - reclen;
@@ -353,14 +339,14 @@ struct ext2_dir_entry_2 * ext2_find_entr
if (de-rec_len == 0) {
ext2_error(dir-i_sb, __FUNCTION__,
zero-length directory entry

[PATCH 8/17] jfs: use locking read_mapping_page

2007-04-11 Thread Nate Diller

Use the new locking variant of read_mapping_page to avoid doing extra work.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c 
linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c
--- linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c  2007-04-09 17:23:48.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c 2007-04-09 
21:37:09.0 -0700
@@ -632,12 +632,11 @@ struct metapage *__get_metapage(struct i
}
SetPageUptodate(page);
} else {
-   page = read_mapping_page(mapping, page_index, NULL);
-   if (IS_ERR(page) || !PageUptodate(page)) {
+   page = __read_mapping_page(mapping, page_index, NULL);
+   if (IS_ERR(page)) {
jfs_err(read_mapping_page failed!);
return NULL;
}
-   lock_page(page);
}
 
mp = page_to_mp(page, page_offset);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 9/17] minix: convert dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace minix dir_get_page() and dir_put_page() using the new
read_kmap_page() and put_kmapped_page()/put_locked_page() calls.  Also, use
__read_kmap_page() instead of re-taking the page_lock.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/minix/dir.c 
linux-2.6.21-rc5-mm4-test/fs/minix/dir.c
--- linux-2.6.21-rc5-mm4/fs/minix/dir.c 2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/minix/dir.c2007-04-06 02:31:55.0 
-0700
@@ -23,12 +23,6 @@ const struct file_operations minix_dir_o
.fsync  = minix_sync_file,
 };
 
-static inline void dir_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
 /*
  * Return the offset into page `page_nr' of the last valid
  * byte in that page, plus one.
@@ -60,22 +54,6 @@ static int dir_commit_chunk(struct page 
return err;
 }
 
-static struct page * dir_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir-i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageUptodate(page))
-   goto fail;
-   }
-   return page;
-
-fail:
-   dir_put_page(page);
-   return ERR_PTR(-EIO);
-}
-
 static inline void *minix_next_entry(void *de, struct minix_sb_info *sbi)
 {
return (void*)((char*)de + sbi-s_dirsize);
@@ -102,7 +80,7 @@ static int minix_readdir(struct file * f
 
for ( ; n  npages; n++, offset = 0) {
char *p, *kaddr, *limit;
-   struct page *page = dir_get_page(inode, n);
+   struct page *page = read_kmap_page(inode-i_mapping, n);
 
if (IS_ERR(page))
continue;
@@ -128,12 +106,12 @@ static int minix_readdir(struct file * f
(n  PAGE_CACHE_SHIFT) | offset,
inumber, DT_UNKNOWN);
if (over) {
-   dir_put_page(page);
+   put_kmapped_page(page);
goto done;
}
}
}
-   dir_put_page(page);
+   put_kmapped_page(page);
}
 
 done:
@@ -177,7 +155,7 @@ minix_dirent *minix_find_entry(struct de
for (n = 0; n  npages; n++) {
char *kaddr, *limit;
 
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir-i_mapping, n);
if (IS_ERR(page))
continue;
 
@@ -198,7 +176,7 @@ minix_dirent *minix_find_entry(struct de
if (namecompare(namelen, sbi-s_namelen, name, namx))
goto found;
}
-   dir_put_page(page);
+   put_kmapped_page(page);
}
return NULL;
 
@@ -233,11 +211,10 @@ int minix_add_link(struct dentry *dentry
for (n = 0; n = npages; n++) {
char *limit, *dir_end;
 
-   page = dir_get_page(dir, n);
+   page = __read_kmap_page(dir-i_mapping, n);
err = PTR_ERR(page);
if (IS_ERR(page))
goto out;
-   lock_page(page);
kaddr = (char*)page_address(page);
dir_end = kaddr + minix_last_byte(dir, n);
limit = kaddr + PAGE_CACHE_SIZE - sbi-s_dirsize;
@@ -265,8 +242,7 @@ int minix_add_link(struct dentry *dentry
if (namecompare(namelen, sbi-s_namelen, name, namx))
goto out_unlock;
}
-   unlock_page(page);
-   dir_put_page(page);
+   put_locked_page(page);
}
BUG();
return -EINVAL;
@@ -288,13 +264,12 @@ got_it:
err = dir_commit_chunk(page, from, to);
dir-i_mtime = dir-i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
-out_put:
-   dir_put_page(page);
+   put_kmapped_page(page);
 out:
return err;
 out_unlock:
-   unlock_page(page);
-   goto out_put;
+   put_locked_page(page);
+   return err;
 }
 
 int minix_delete_entry(struct minix_dir_entry *de, struct page *page)
@@ -314,7 +289,7 @@ int minix_delete_entry(struct minix_dir_
} else {
unlock_page(page);
}
-   dir_put_page(page);
+   put_kmapped_page(page);
inode-i_ctime = inode-i_mtime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
return err;
@@ -378,7 +353,7 @@ int minix_empty_dir(struct inode * inode
for (i = 0; i  npages; i++) {
char *p, *kaddr, *limit;
 
-   page = dir_get_page(inode, i);
+   page = read_kmap_page(inode-i_mapping, i);
if (IS_ERR(page

[PATCH 10/17] mtd: convert page_read to read_kmap_page

2007-04-11 Thread Nate Diller

Replace page_read() with read_kmap_page()/__read_kmap_page().  This probably
fixes behaviour on highmem systems, since page_address() was being used
without kmap().  Also eliminate the need to re-take the page lock during
writes to the page.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c 
linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c
--- linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c2007-04-05 
17:14:24.0 -0700
+++ linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c   2007-04-06 
01:59:19.0 -0700
@@ -39,12 +39,6 @@ struct block2mtd_dev {
 /* Static info about the MTD, used in cleanup_module */
 static LIST_HEAD(blkmtd_device_list);
 
-
-static struct page *page_read(struct address_space *mapping, int index)
-{
-   return read_mapping_page(mapping, index, NULL);
-}
-
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
@@ -56,23 +50,19 @@ static int _block2mtd_erase(struct block
u_long *max;
 
while (pages) {
-   page = page_read(mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = __read_kmap_page(mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
max = page_address(page) + PAGE_SIZE;
for (p=page_address(page); pmax; p++)
if (*p != -1UL) {
-   lock_page(page);
memset(page_address(page), 0xff, PAGE_SIZE);
set_page_dirty(page);
-   unlock_page(page);
break;
}
 
-   page_cache_release(page);
+   put_locked_page(page);
pages--;
index++;
}
@@ -125,14 +115,12 @@ static int block2mtd_read(struct mtd_inf
cpylen = len;   // this page
len = len - cpylen;
 
-   page = page_read(dev-blkdev-bd_inode-i_mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = read_kmap_page(dev-blkdev-bd_inode-i_mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
memcpy(buf, page_address(page) + offset, cpylen);
-   page_cache_release(page);
+   put_kmapped_page(page);
 
if (retlen)
*retlen += cpylen;
@@ -163,19 +151,15 @@ static int _block2mtd_write(struct block
cpylen = len;   // this page
len = len - cpylen;
 
-   page = page_read(mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = __read_kmap_page(mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
if (memcmp(page_address(page)+offset, buf, cpylen)) {
-   lock_page(page);
memcpy(page_address(page) + offset, buf, cpylen);
set_page_dirty(page);
-   unlock_page(page);
}
-   page_cache_release(page);
+   put_locked_page(page);
 
if (retlen)
*retlen += cpylen;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/17] reiser4: remove redundant read_mapping_page error checks

2007-04-11 Thread Nate Diller

read_mapping_page() is now fully synchronous, so there's no need wait for
the page lock or check for I/O errors.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c   
2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c  
2007-04-10 21:33:47.0 -0700
@@ -608,14 +608,6 @@ int extent2tail(unix_file_info_t *uf_inf
break;
}
 
-   wait_on_page_locked(page);
-
-   if (!PageUptodate(page)) {
-   page_cache_release(page);
-   result = RETERR(-EIO);
-   break;
-   }
-
/* cut part of file we have read */
start_byte = (__u64) (i  PAGE_CACHE_SHIFT);
set_key_offset(from, start_byte);
diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c   
2007-04-10 19:41:14.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c  
2007-04-10 21:38:41.0 -0700
@@ -1220,15 +1220,8 @@ int reiser4_read_extent(struct file *fil
page = read_mapping_page(mapping, cur_page, file);
if (IS_ERR(page))
return PTR_ERR(page);
-   lock_page(page);
-   if (!PageUptodate(page)) {
-   unlock_page(page);
-   page_cache_release(page);
-   warning(jmacd-97178, extent_read: page is not up to 
date);
-   return RETERR(-EIO);
-   }
+
mark_page_accessed(page);
-   unlock_page(page);
 
/* If users can be writing to this page using arbitrary virtual
   addresses, take care about potential aliasing before reading
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/17] partition: remove redundant read_mapping_page error checks

2007-04-11 Thread Nate Diller

Remove unneeded PageError checking in read_dev_sector(), and clean up the
code a bit.

Can anyone point out why it's OK to use page_address() here on a page which
has not been kmapped?  If it's not OK, then a good number of callers need to
be fixed.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/partitions/check.c 
linux-2.6.21-rc6-mm1-test/fs/partitions/check.c
--- linux-2.6.21-rc6-mm1/fs/partitions/check.c  2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/partitions/check.c 2007-04-10 
21:59:01.0 -0700
@@ -568,16 +568,12 @@ unsigned char *read_dev_sector(struct bl
 
page = read_mapping_page(mapping, (pgoff_t)(n  (PAGE_CACHE_SHIFT-9)),
 NULL);
-   if (!IS_ERR(page)) {
-   if (PageError(page))
-   goto fail;
-   p-v = page;
-   return (unsigned char *)page_address(page) +  ((n  ((1  
(PAGE_CACHE_SHIFT - 9)) - 1))  9);
-fail:
-   page_cache_release(page);
+   if (IS_ERR(page)) {
+   p-v = NULL;
+   return NULL;
}
-   p-v = NULL;
-   return NULL;
+   p-v = page;
+   return (unsigned char *)page_address(page) +  ((n  ((1  
(PAGE_CACHE_SHIFT - 9)) - 1))  9);
 }
 
 EXPORT_SYMBOL(read_dev_sector);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/17] sysv: convert dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace sysv dir_get_page() with the new read_kmap_page().

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/sysv/dir.c 
linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c
--- linux-2.6.21-rc5-mm4/fs/sysv/dir.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c 2007-04-06 01:59:19.0 
-0700
@@ -50,15 +50,6 @@ static int dir_commit_chunk(struct page 
return err;
 }
 
-static struct page * dir_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir-i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page))
-   kmap(page);
-   return page;
-}
-
 static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
unsigned long pos = filp-f_pos;
@@ -77,7 +68,7 @@ static int sysv_readdir(struct file * fi
for ( ; n  npages; n++, offset = 0) {
char *kaddr, *limit;
struct sysv_dir_entry *de;
-   struct page *page = dir_get_page(inode, n);
+   struct page *page = read_kmap_page(inode-i_mapping, n);
 
if (IS_ERR(page))
continue;
@@ -149,7 +140,7 @@ struct sysv_dir_entry *sysv_find_entry(s
 
do {
char *kaddr;
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir-i_mapping, n);
if (!IS_ERR(page)) {
kaddr = (char*)page_address(page);
de = (struct sysv_dir_entry *) kaddr;
@@ -191,7 +182,7 @@ int sysv_add_link(struct dentry *dentry,
 
/* We take care of directory expansion in the same loop */
for (n = 0; n = npages; n++) {
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir-i_mapping, n);
err = PTR_ERR(page);
if (IS_ERR(page))
goto out;
@@ -299,7 +290,7 @@ int sysv_empty_dir(struct inode * inode)
for (i = 0; i  npages; i++) {
char *kaddr;
struct sysv_dir_entry * de;
-   page = dir_get_page(inode, i);
+   page = read_kmap_page(inode-i_mapping, i);
 
if (IS_ERR(page))
continue;
@@ -353,7 +344,7 @@ void sysv_set_link(struct sysv_dir_entry
 
 struct sysv_dir_entry * sysv_dotdot (struct inode *dir, struct page **p)
 {
-   struct page *page = dir_get_page(dir, 0);
+   struct page *page = read_kmap_page(dir-i_mapping, 0);
struct sysv_dir_entry *de = NULL;
 
if (!IS_ERR(page)) {
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/17] ufs: convert ufs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ufs_get_page()/ufs_get_locked_page() and
ufs_put_page()/ufs_put_locked_page() using the new read_kmap_page() and
put_kmapped_page() calls and their locking variants.  Also, change the
ufs_check_page() call to return the page's error status, and update the
call sites accordingly.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/balloc.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c
--- linux-2.6.21-rc5-mm4/fs/ufs/balloc.c2007-04-05 17:13:29.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c   2007-04-06 12:46:02.0 
-0700
@@ -272,7 +272,7 @@ static void ufs_change_blocknr(struct in
index = i  (PAGE_CACHE_SHIFT - inode-i_blkbits);
 
if (likely(cur_index != index)) {
-   page = ufs_get_locked_page(mapping, index);
+   page = __read_mapping_page(mapping, index, NULL);
if (!page)/* it was truncated */
continue;
if (IS_ERR(page)) {/* or EIO */
@@ -325,8 +325,10 @@ static void ufs_change_blocknr(struct in
bh = bh-b_this_page;
} while (bh != head);
 
-   if (likely(cur_index != index))
-   ufs_put_locked_page(page);
+   if (likely(cur_index != index)) {
+   unlock_page(page);
+   page_cache_release(page);
+   }
}
UFSD(EXIT\n);
 }
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/truncate.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c
--- linux-2.6.21-rc5-mm4/fs/ufs/truncate.c  2007-04-05 17:13:29.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c 2007-04-06 12:46:14.0 
-0700
@@ -395,8 +395,9 @@ static int ufs_alloc_lastblock(struct in
 
lastfrag--;
 
-   lastpage = ufs_get_locked_page(mapping, lastfrag 
-  (PAGE_CACHE_SHIFT - inode-i_blkbits));
+   lastpage = __read_mapping_page(mapping, lastfrag 
+  (PAGE_CACHE_SHIFT - inode-i_blkbits),
+  NULL);
if (IS_ERR(lastpage)) {
err = -EIO;
goto out;
@@ -441,7 +442,8 @@ static int ufs_alloc_lastblock(struct in
   }
}
 out_unlock:
-   ufs_put_locked_page(lastpage);
+   unlock_page(lastpage);
+   page_cache_release(lastpage);
 out:
return err;
 }
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/util.c
--- linux-2.6.21-rc5-mm4/fs/ufs/util.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.c 2007-04-06 12:40:53.0 
-0700
@@ -232,55 +232,3 @@ ufs_set_inode_dev(struct super_block *sb
ufsi-i_u1.i_data[0] = cpu_to_fs32(sb, fs32);
 }
 
-/**
- * ufs_get_locked_page() - locate, pin and lock a pagecache page, if not exist
- * read it from disk.
- * @mapping: the address_space to search
- * @index: the page index
- *
- * Locates the desired pagecache page, if not exist we'll read it,
- * locks it, increments its reference
- * count and returns its address.
- *
- */
-
-struct page *ufs_get_locked_page(struct address_space *mapping,
-pgoff_t index)
-{
-   struct page *page;
-
-   page = find_lock_page(mapping, index);
-   if (!page) {
-   page = read_mapping_page(mapping, index, NULL);
-
-   if (IS_ERR(page)) {
-   printk(KERN_ERR ufs_change_blocknr: 
-  read_mapping_page error: ino %lu, index: %lu\n,
-  mapping-host-i_ino, index);
-   goto out;
-   }
-
-   lock_page(page);
-
-   if (unlikely(page-mapping == NULL)) {
-   /* Truncate got there first */
-   unlock_page(page);
-   page_cache_release(page);
-   page = NULL;
-   goto out;
-   }
-
-   if (!PageUptodate(page) || PageError(page)) {
-   unlock_page(page);
-   page_cache_release(page);
-
-   printk(KERN_ERR ufs_change_blocknr: 
-  can not read page: ino %lu, index: %lu\n,
-  mapping-host-i_ino, index);
-
-   page = ERR_PTR(-EIO);
-   }
-   }
-out:
-   return page;
-}
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.h 
linux-2.6.21-rc5-mm4-test/fs/ufs/util.h
--- linux-2.6.21-rc5-mm4/fs/ufs/util.h  2007-04-05 17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.h 2007-04-06 12:46:36.0 
-0700
@@ -251,16 +251,6 @@ extern void _ubh_ubhcpymem_(struct ufs_s
 #define ubh_memcpyubh(ubh,mem,size) _ubh_memcpyubh_(uspi,ubh,mem

[PATCH 17/17] vxfs: convert vxfs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace vxfs_get_page() with the new read_kmap_page().

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h  2007-04-05 
17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h 2007-04-06 
01:59:19.0 -0700
@@ -69,7 +69,6 @@ extern const struct file_operations   vxfs
 extern int vxfs_read_olt(struct super_block *, u_long);
 
 /* vxfs_subr.c */
-extern struct page *   vxfs_get_page(struct address_space *, u_long);
 extern voidvxfs_put_page(struct page *);
 extern struct buffer_head *vxfs_bread(struct inode *, int);
 
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c   2007-04-05 
17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c  2007-04-06 
01:59:19.0 -0700
@@ -138,7 +138,7 @@ __vxfs_iget(ino_t ino, struct inode *ili
u_long  offset;
 
offset = (ino % (PAGE_SIZE / VXFS_ISIZE)) * VXFS_ISIZE;
-   pp = vxfs_get_page(ilistp-i_mapping, ino * VXFS_ISIZE / PAGE_SIZE);
+   pp = read_kmap_page(ilistp-i_mapping, ino * VXFS_ISIZE / PAGE_SIZE);
 
if (!IS_ERR(pp)) {
struct vxfs_inode_info  *vip;
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c  2007-04-05 
17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c 2007-04-06 
01:59:19.0 -0700
@@ -125,7 +125,7 @@ vxfs_find_entry(struct inode *ip, struct
caddr_t kaddr;
struct page *pp;
 
-   pp = vxfs_get_page(ip-i_mapping, page);
+   pp = read_kmap_page(ip-i_mapping, page);
if (IS_ERR(pp))
continue;
kaddr = (caddr_t)page_address(pp);
@@ -280,7 +280,7 @@ vxfs_readdir(struct file *fp, void *retp
caddr_t kaddr;
struct page *pp;
 
-   pp = vxfs_get_page(ip-i_mapping, page);
+   pp = read_kmap_page(ip-i_mapping, page);
if (IS_ERR(pp))
continue;
kaddr = (caddr_t)page_address(pp);
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c2007-04-05 
17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c   2007-04-06 
01:59:19.0 -0700
@@ -56,39 +56,6 @@ vxfs_put_page(struct page *pp)
 }
 
 /**
- * vxfs_get_page - read a page into memory.
- * @ip:inode to read from
- * @n: page number
- *
- * Description:
- *   vxfs_get_page reads the @n th page of @ip into the pagecache.
- *
- * Returns:
- *   The wanted page on success, else a NULL pointer.
- */
-struct page *
-vxfs_get_page(struct address_space *mapping, u_long n)
-{
-   struct page *   pp;
-
-   pp = read_mapping_page(mapping, n, NULL);
-
-   if (!IS_ERR(pp)) {
-   kmap(pp);
-   /** if (!PageChecked(pp)) **/
-   /** vxfs_check_page(pp); **/
-   if (PageError(pp))
-   goto fail;
-   }
-   
-   return (pp);
-
-fail:
-   vxfs_put_page(pp);
-   return ERR_PTR(-EIO);
-}
-
-/**
  * vxfs_bread - read buffer for a give inode,block tuple
  * @ip:inode
  * @block: logical block
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/17] cramfs: use read_mapping_page

2007-04-11 Thread Nate Diller

read_mapping_page_async() is going away, so convert its only user to
read_mapping_page().  This change has not been benchmarked, however, in
order to get real parallelism this wants something completely different,
like __do_page_cache_readahead(), which is not currently exported.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/cramfs/inode.c 
linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c
--- linux-2.6.21-rc6-mm1/fs/cramfs/inode.c  2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c 2007-04-09 21:37:09.0 
-0700
@@ -180,8 +180,7 @@ static void *cramfs_read(struct super_bl
struct page *page = NULL;
 
if (blocknr + i  devsize) {
-   page = read_mapping_page_async(mapping, blocknr + i,
-   NULL);
+   page = read_mapping_page(mapping, blocknr + i, NULL);
/* synchronous error? */
if (IS_ERR(page))
page = NULL;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/17] hfsplus: remove redundant read_mapping_page error check

2007-04-11 Thread Nate Diller

Now that read_mapping_page() does error checking internally, there is no
need to check PageError here.

Signed-off-by: Nate Diller [EMAIL PROTECTED] 

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c 
linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c
--- linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c 2007-04-09 17:20:13.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c2007-04-10 
21:28:45.0 -0700
@@ -442,10 +442,6 @@ static struct hfs_bnode *__hfs_bnode_cre
page = read_mapping_page(mapping, block, NULL);
if (IS_ERR(page))
goto fail;
-   if (PageError(page)) {
-   page_cache_release(page);
-   goto fail;
-   }
page_cache_release(page);
node-page[i] = page;
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/17] reiserfs: convert reiserfs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace reiserfs_get_page() and reiserfs_put_page() using the new
read_kmap_page() and put_kmapped_page() calls and their locking variants. 
Also, propagate the gfp_mask() deadlock comment to callsites.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c 
linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c
--- linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c2007-04-05 17:14:25.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c   2007-04-06 
14:41:34.0 -0700
@@ -438,33 +438,6 @@ int xattr_readdir(struct file *file, fil
return res;
 }
 
-/* Internal operations on file data */
-static inline void reiserfs_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
-static struct page *reiserfs_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir-i_mapping;
-   struct page *page;
-   /* We can deadlock if we try to free dentries,
-  and an unlink/rmdir has just occured - GFP_NOFS avoids this */
-   mapping_set_gfp_mask(mapping, GFP_NOFS);
-   page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
-  fail:
-   reiserfs_put_page(page);
-   return ERR_PTR(-EIO);
-}
-
 static inline __u32 xattr_hash(const char *msg, int len)
 {
return csum_partial(msg, len, 0);
@@ -537,13 +510,15 @@ reiserfs_xattr_set(struct inode *inode, 
else
chunk = buffer_size - buffer_pos;
 
-   page = reiserfs_get_page(xinode, file_pos  PAGE_CACHE_SHIFT);
+   /* We can deadlock if we try to free dentries,
+  and an unlink/rmdir has just occured - GFP_NOFS avoids this 
*/
+   mapping_set_gfp_mask(mapping, GFP_NOFS);
+   page = __read_kmap_page(mapping, file_pos  PAGE_CACHE_SHIFT);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto out_filp;
}
 
-   lock_page(page);
data = page_address(page);
 
if (file_pos == 0) {
@@ -566,8 +541,7 @@ reiserfs_xattr_set(struct inode *inode, 
 page_offset + chunk +
 skip);
}
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
buffer_pos += chunk;
file_pos += chunk;
skip = 0;
@@ -646,13 +620,15 @@ reiserfs_xattr_get(const struct inode *i
else
chunk = isize - file_pos;
 
-   page = reiserfs_get_page(xinode, file_pos  PAGE_CACHE_SHIFT);
+   /* We can deadlock if we try to free dentries,
+  and an unlink/rmdir has just occured - GFP_NOFS avoids this 
*/
+   mapping_set_gfp_mask(xinode-i_mapping, GFP_NOFS);
+   page = __read_kmap_page(xinode-i_mapping, file_pos  
PAGE_CACHE_SHIFT);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto out_dput;
}
 
-   lock_page(page);
data = page_address(page);
if (file_pos == 0) {
struct reiserfs_xattr_header *rxh =
@@ -661,8 +637,7 @@ reiserfs_xattr_get(const struct inode *i
chunk -= skip;
/* Magic doesn't match up.. */
if (rxh-h_magic != cpu_to_le32(REISERFS_XATTR_MAGIC)) {
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
reiserfs_warning(inode-i_sb,
 Invalid magic for xattr (%s) 
 associated with %k, name,
@@ -673,8 +648,7 @@ reiserfs_xattr_get(const struct inode *i
hash = le32_to_cpu(rxh-h_hash);
}
memcpy(buffer + buffer_pos, data + skip, chunk);
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
file_pos += chunk;
buffer_pos += chunk;
skip = 0;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/17] ntfs: convert ntfs_map_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ntfs_map_page() and ntfs_unmap_page() using the new read_kmap_page()
and put_kmapped_page() calls, and their locking variants, and remove
unneeded PageError checking.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/aops.h 
linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h
--- linux-2.6.21-rc5-mm4/fs/ntfs/aops.h 2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h2007-04-06 01:59:19.0 
-0700
@@ -31,73 +31,6 @@
 
 #include inode.h
 
-/**
- * ntfs_unmap_page - release a page that was mapped using ntfs_map_page()
- * @page:  the page to release
- *
- * Unpin, unmap and release a page that was obtained from ntfs_map_page().
- */
-static inline void ntfs_unmap_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
-/**
- * ntfs_map_page - map a page into accessible memory, reading it if necessary
- * @mapping:   address space for which to obtain the page
- * @index: index into the page cache for @mapping of the page to map
- *
- * Read a page from the page cache of the address space @mapping at position
- * @index, where @index is in units of PAGE_CACHE_SIZE, and not in bytes.
- *
- * If the page is not in memory it is loaded from disk first using the readpage
- * method defined in the address space operations of @mapping and the page is
- * added to the page cache of @mapping in the process.
- *
- * If the page belongs to an mst protected attribute and it is marked as such
- * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no
- * error checking is performed.  This means the caller has to verify whether
- * the ntfs record(s) contained in the page are valid or not using one of the
- * ntfs_is__record{,p}() macros, where  is the record type you are
- * expecting to see.  (For details of the macros, see fs/ntfs/layout.h.)
- *
- * If the page is in high memory it is mapped into memory directly addressible
- * by the kernel.
- *
- * Finally the page count is incremented, thus pinning the page into place.
- *
- * The above means that page_address(page) can be used on all pages obtained
- * with ntfs_map_page() to get the kernel virtual address of the page.
- *
- * When finished with the page, the caller has to call ntfs_unmap_page() to
- * unpin, unmap and release the page.
- *
- * Note this does not grant exclusive access. If such is desired, the caller
- * must provide it independently of the ntfs_{un}map_page() calls by using
- * a {rw_}semaphore or other means of serialization. A spin lock cannot be
- * used as ntfs_map_page() can block.
- *
- * The unlocked and uptodate page is returned on success or an encoded error
- * on failure. Caller has to test for error using the IS_ERR() macro on the
- * return value. If that evaluates to 'true', the negative error code can be
- * obtained using PTR_ERR() on the return value of ntfs_map_page().
- */
-static inline struct page *ntfs_map_page(struct address_space *mapping,
-   unsigned long index)
-{
-   struct page *page = read_mapping_page(mapping, index, NULL);
-
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageError(page))
-   return page;
-   ntfs_unmap_page(page);
-   return ERR_PTR(-EIO);
-   }
-   return page;
-}
-
 #ifdef NTFS_RW
 
 extern void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs);
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c 
linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c
--- linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c   2006-11-29 13:57:37.0 
-0800
+++ linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c  2007-04-06 12:40:53.0 
-0700
@@ -72,7 +72,7 @@ int __ntfs_bitmap_set_bits_in_run(struct
 
/* Get the page containing the first bit (@start_bit). */
mapping = vi-i_mapping;
-   page = ntfs_map_page(mapping, index);
+   page = read_kmap_page(mapping, index);
if (IS_ERR(page)) {
if (!is_rollback)
ntfs_error(vi-i_sb, Failed to map first page (error 
@@ -123,8 +123,8 @@ int __ntfs_bitmap_set_bits_in_run(struct
/* Update @index and get the next page. */
flush_dcache_page(page);
set_page_dirty(page);
-   ntfs_unmap_page(page);
-   page = ntfs_map_page(mapping, ++index);
+   put_kmapped_page(page);
+   page = read_kmap_page(mapping, ++index);
if (IS_ERR(page))
goto rollback;
kaddr = page_address(page);
@@ -159,7 +159,7 @@ done:
/* We are done.  Unmap the page and return success. */
flush_dcache_page(page);
set_page_dirty(page);
-   ntfs_unmap_page(page);
+   put_kmapped_page(page);
ntfs_debug(Done.);
return 0;
 rollback:
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/dir.c 
linux-2.6.21

[PATCH 6/17] hfs: remove redundant read_mapping_page error check

2007-04-11 Thread Nate Diller

Now that read_mapping_page() does error checking internally, there is no
need to check PageError here.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfs/bnode.c 
linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c
--- linux-2.6.21-rc6-mm1/fs/hfs/bnode.c 2007-04-09 17:20:13.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c2007-04-10 21:28:03.0 
-0700
@@ -282,10 +282,6 @@ static struct hfs_bnode *__hfs_bnode_cre
page = read_mapping_page(mapping, block++, NULL);
if (IS_ERR(page))
goto fail;
-   if (PageError(page)) {
-   page_cache_release(page);
-   goto fail;
-   }
page_cache_release(page);
node-page[i] = page;
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/17] jffs2: convert jffs2_gc_fetch_page to read_cache_page

2007-04-11 Thread Nate Diller

Replace jffs2_gc_fetch_page() and jffs2_gc_release_page() using the
read_cache_page() and put_kmapped_page() calls, and update the call site
accordingly.  Explicit calls to kmap()/kunmap() make the code more clear.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/fs.c 
linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c
--- linux-2.6.21-rc5-mm4/fs/jffs2/fs.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c 2007-04-06 01:59:19.0 
-0700
@@ -621,33 +621,6 @@ struct jffs2_inode_info *jffs2_gc_fetch_
return JFFS2_INODE_INFO(inode);
 }
 
-unsigned char *jffs2_gc_fetch_page(struct jffs2_sb_info *c,
-  struct jffs2_inode_info *f,
-  unsigned long offset,
-  unsigned long *priv)
-{
-   struct inode *inode = OFNI_EDONI_2SFFJ(f);
-   struct page *pg;
-
-   pg = read_cache_page(inode-i_mapping, offset  PAGE_CACHE_SHIFT,
-(void *)jffs2_do_readpage_unlock, inode);
-   if (IS_ERR(pg))
-   return (void *)pg;
-
-   *priv = (unsigned long)pg;
-   return kmap(pg);
-}
-
-void jffs2_gc_release_page(struct jffs2_sb_info *c,
-  unsigned char *ptr,
-  unsigned long *priv)
-{
-   struct page *pg = (void *)*priv;
-
-   kunmap(pg);
-   page_cache_release(pg);
-}
-
 static int jffs2_flash_setup(struct jffs2_sb_info *c) {
int ret = 0;
 
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/gc.c 
linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c
--- linux-2.6.21-rc5-mm4/fs/jffs2/gc.c  2007-04-05 17:13:10.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c 2007-04-06 01:59:19.0 
-0700
@@ -1078,7 +1078,7 @@ static int jffs2_garbage_collect_dnode(s
uint32_t alloclen, offset, orig_end, orig_start;
int ret = 0;
unsigned char *comprbuf = NULL, *writebuf;
-   unsigned long pg;
+   struct page *page;
unsigned char *pg_ptr;
 
memset(ri, 0, sizeof(ri));
@@ -1219,12 +1219,16 @@ static int jffs2_garbage_collect_dnode(s
 *page OK. We'll actually write it out again in commit_write, which 
is a little
 *suboptimal, but at least we're correct.
 */
-   pg_ptr = jffs2_gc_fetch_page(c, f, start, pg);
+   page = read_cache_page(OFNI_EDONI_2SFFJ(f)-i_mapping,
+   start  PAGE_CACHE_SHIFT,
+   (void *)jffs2_do_readpage_unlock,
+   OFNI_EDONI_2SFFJ(f));
 
-   if (IS_ERR(pg_ptr)) {
+   if (IS_ERR(page)) {
printk(KERN_WARNING read_cache_page() returned error: %ld\n, 
PTR_ERR(pg_ptr));
-   return PTR_ERR(pg_ptr);
+   return PTR_ERR(page);
}
+   pg_ptr = kmap(page);
 
offset = start;
while(offset  orig_end) {
@@ -1287,6 +1291,7 @@ static int jffs2_garbage_collect_dnode(s
}
}
 
-   jffs2_gc_release_page(c, pg_ptr, pg);
+   kunmap(page);
+   page_cache_release(page);
return ret;
 }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 8/13] ntfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ntfs/aops.c 
linux-2.6.21-rc6-mm1-test/fs/ntfs/aops.c
--- linux-2.6.21-rc6-mm1/fs/ntfs/aops.c 2007-04-09 10:41:47.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/ntfs/aops.c2007-04-09 18:18:23.0 
-0700
@@ -245,8 +241,7 @@ static int ntfs_read_block(struct page *
rl = NULL;
nr = i = 0;
do {
-   u8 *kaddr;
-   int err;
+   int err = 0;
 
if (unlikely(buffer_uptodate(bh)))
continue;
@@ -254,7 +249,6 @@ static int ntfs_read_block(struct page *
arr[nr++] = bh;
continue;
}
-   err = 0;
bh->b_bdev = vol->sb->s_bdev;
/* Is the block within the allowed limits? */
if (iblock < lblock) {
@@ -340,10 +334,7 @@ handle_hole:
bh->b_blocknr = -1UL;
clear_buffer_mapped(bh);
 handle_zblock:
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + i * blocksize, 0, blocksize);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, i * blocksize, blocksize);
if (likely(!err))
set_buffer_uptodate(bh);
} while (i++, iblock++, (bh = bh->b_this_page) != head);
@@ -460,10 +451,7 @@ retry_readpage:
 * ok to ignore the compressed flag here.
 */
if (unlikely(page->index > 0)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr, 0, PAGE_CACHE_SIZE);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, 0, PAGE_CACHE_SIZE);
goto done;
}
if (!NInoAttr(ni))
@@ -790,14 +778,9 @@ lock_retry_remap:
 * uptodate so it can get discarded by the VM.
 */
if (err == -ENOENT || lcn == LCN_ENOENT) {
-   u8 *kaddr;
-
bh->b_blocknr = -1;
clear_buffer_dirty(bh);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + bh_offset(bh), 0, blocksize);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, bh_offset(bh), blocksize);
set_buffer_uptodate(bh);
err = 0;
continue;
@@ -1422,10 +1405,7 @@ retry_writepage:
if (page->index >= (i_size >> PAGE_CACHE_SHIFT)) {
/* The page straddles i_size. */
unsigned int ofs = i_size & ~PAGE_CACHE_MASK;
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + ofs, 0, PAGE_CACHE_SIZE - ofs);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, ofs, PAGE_CACHE_SIZE - ofs);
}
/* Handle mst protected attributes. */
if (NInoMstProtected(ni))
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ntfs/file.c 
linux-2.6.21-rc6-mm1-test/fs/ntfs/file.c
--- linux-2.6.21-rc6-mm1/fs/ntfs/file.c 2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/ntfs/file.c2007-04-09 18:18:23.0 
-0700
@@ -606,11 +606,8 @@ do_next_page:
ntfs_submit_bh_for_read(bh);
*wait_bh++ = bh;
} else {
-   u8 *kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + bh_offset(bh), 0,
+   zero_user_page(page, bh_offset(bh),
blocksize);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
set_buffer_uptodate(bh);
}
}
@@ -685,12 +682,8 @@ map_buffer_cached:
ntfs_submit_bh_for_read(bh);
*wait_bh++ = bh;
} else {
-   u8 *kaddr = kmap_atomic(page,
-   KM_USER0);
-   memset(kaddr + bh_offset(bh),
-   0, blocksize);
-

[PATCH 2/13] affs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/affs/file.c 
linux-2.6.21-rc6-mm1-test/fs/affs/file.c
--- linux-2.6.21-rc6-mm1/fs/affs/file.c 2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/affs/file.c2007-04-09 18:18:23.0 
-0700
@@ -628,11 +628,7 @@ static int affs_prepare_write_ofs(struct
return err;
}
if (to < PAGE_CACHE_SIZE) {
-   char *kaddr = kmap_atomic(page, KM_USER0);
-
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, to, PAGE_CACHE_SIZE - to);
if (size > offset + to) {
if (size < offset + PAGE_CACHE_SIZE)
tmp = size & ~PAGE_CACHE_MASK;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/13] ext4: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext4/inode.c 
linux-2.6.21-rc6-mm1-test/fs/ext4/inode.c
--- linux-2.6.21-rc6-mm1/fs/ext4/inode.c2007-04-10 17:15:04.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext4/inode.c   2007-04-10 18:33:04.0 
-0700
@@ -1791,7 +1791,6 @@ int ext4_block_truncate_page(handle_t *h
struct inode *inode = mapping->host;
struct buffer_head *bh;
int err = 0;
-   void *kaddr;
 
if ((EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) &&
test_opt(inode->i_sb, EXTENTS) &&
@@ -1808,10 +1807,7 @@ int ext4_block_truncate_page(handle_t *h
 */
if (!page_has_buffers(page) && test_opt(inode->i_sb, NOBH) &&
 ext4_should_writeback_data(inode) && PageUptodate(page)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
set_page_dirty(page);
goto unlock;
}
@@ -1864,11 +1860,7 @@ int ext4_block_truncate_page(handle_t *h
goto unlock;
}
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-
+   zero_user_page(page, offset, length);
BUFFER_TRACE(bh, "zeroed end of block");
 
err = 0;
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext4/writeback.c 
linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c
--- linux-2.6.21-rc6-mm1/fs/ext4/writeback.c2007-04-10 18:05:52.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c   2007-04-10 
18:33:04.0 -0700
@@ -961,7 +961,6 @@ int ext4_wb_writepage(struct page *page,
loff_t i_size = i_size_read(inode);
pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
unsigned offset;
-   void *kaddr;
 
wb_debug("writepage %lu from inode %lu\n", page->index, inode->i_ino);
 
@@ -1011,10 +1010,7 @@ int ext4_wb_writepage(struct page *page,
 * the  page size, the remaining memory is zeroed when mapped, and
 * writes to that region are not written out to the file."
 */
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, PAGE_CACHE_SIZE - offset);
return ext4_wb_write_single_page(page, wbc);
 }
 
@@ -1065,7 +1061,6 @@ int ext4_wb_block_truncate_page(handle_t
struct inode *inode = mapping->host;
struct buffer_head bh, *bhw = 
unsigned blocksize, length;
-   void *kaddr;
int err = 0;
 
wb_debug("partial truncate from %lu on page %lu from inode %lu\n",
@@ -1104,10 +1099,7 @@ int ext4_wb_block_truncate_page(handle_t
}
}
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
SetPageUptodate(page);
__set_page_dirty_nobuffers(page);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/13] ext3: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext3/inode.c 
linux-2.6.21-rc6-mm1-test/fs/ext3/inode.c
--- linux-2.6.21-rc6-mm1/fs/ext3/inode.c2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext3/inode.c   2007-04-09 18:18:23.0 
-0700
@@ -1767,7 +1767,6 @@ static int ext3_block_truncate_page(hand
struct inode *inode = mapping->host;
struct buffer_head *bh;
int err = 0;
-   void *kaddr;
 
blocksize = inode->i_sb->s_blocksize;
length = blocksize - (offset & (blocksize - 1));
@@ -1779,10 +1778,7 @@ static int ext3_block_truncate_page(hand
 */
if (!page_has_buffers(page) && test_opt(inode->i_sb, NOBH) &&
 ext3_should_writeback_data(inode) && PageUptodate(page)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
set_page_dirty(page);
goto unlock;
}
@@ -1835,11 +1831,7 @@ static int ext3_block_truncate_page(hand
goto unlock;
}
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-
+   zero_user_page(page, offset, length);
BUFFER_TRACE(bh, "zeroed end of block");
 
err = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/13] gfs2: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/gfs2/bmap.c 
linux-2.6.21-rc6-mm1-test/fs/gfs2/bmap.c
--- linux-2.6.21-rc6-mm1/fs/gfs2/bmap.c 2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/gfs2/bmap.c2007-04-09 18:18:23.0 
-0700
@@ -885,7 +885,6 @@ static int gfs2_block_truncate_page(stru
unsigned blocksize, iblock, length, pos;
struct buffer_head *bh;
struct page *page;
-   void *kaddr;
int err;
 
page = grab_cache_page(mapping, index);
@@ -933,10 +932,7 @@ static int gfs2_block_truncate_page(stru
if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip))
gfs2_trans_add_bh(ip->i_gl, bh, 0);
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
 
 unlock:
unlock_page(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/13] reiserfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiserfs/file.c 
linux-2.6.21-rc6-mm1-test/fs/reiserfs/file.c
--- linux-2.6.21-rc6-mm1/fs/reiserfs/file.c 2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiserfs/file.c2007-04-09 
18:18:23.0 -0700
@@ -1059,20 +1059,12 @@ static int reiserfs_prepare_file_region_
   maping blocks, since there is none, so we just zero out remaining
   parts of first and last pages in write area (if needed) */
if ((pos & ~((loff_t) PAGE_CACHE_SIZE - 1)) > inode->i_size) {
-   if (from != 0) {/* First page needs to be partially 
zeroed */
-   char *kaddr = kmap_atomic(prepared_pages[0], KM_USER0);
-   memset(kaddr, 0, from);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(prepared_pages[0]);
-   }
-   if (to != PAGE_CACHE_SIZE) {/* Last page needs to be 
partially zeroed */
-   char *kaddr =
-   kmap_atomic(prepared_pages[num_pages - 1],
-   KM_USER0);
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(prepared_pages[num_pages - 1]);
-   }
+   if (from != 0)  /* First page needs to be partially 
zeroed */
+   zero_user_page(prepared_pages[0], 0, from);
+
+   if (to != PAGE_CACHE_SIZE)  /* Last page needs to be 
partially zeroed */
+   zero_user_page(prepared_pages[num_pages-1], to,
+   PAGE_CACHE_SIZE - to);
 
/* Since all blocks are new - use already calculated value */
return blocks;
@@ -1199,13 +1191,9 @@ static int reiserfs_prepare_file_region_
ll_rw_block(READ, 1, );
*wait_bh++ = bh;
} else {/* Not mapped, zero it */
-   char *kaddr =
-   kmap_atomic(prepared_pages[0],
-   KM_USER0);
-   memset(kaddr + block_start, 0,
-  from - block_start);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(prepared_pages[0]);
+   zero_user_page(prepared_pages[0],
+  block_start,
+  from - block_start);
set_buffer_uptodate(bh);
}
}
@@ -1237,13 +1225,8 @@ static int reiserfs_prepare_file_region_
ll_rw_block(READ, 1, );
*wait_bh++ = bh;
} else {/* Not mapped, zero it */
-   char *kaddr =
-   kmap_atomic(prepared_pages
-   [num_pages - 1],
-   KM_USER0);
-   memset(kaddr + to, 0, block_end - to);
-   kunmap_atomic(kaddr, KM_USER0);
-   
flush_dcache_page(prepared_pages[num_pages - 1]);
+   
zero_user_page(prepared_pages[num_pages-1],
+   to, block_end - to);
set_buffer_uptodate(bh);
}
}
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiserfs/inode.c 
linux-2.6.21-rc6-mm1-test/fs/reiserfs/inode.c
--- linux-2.6.21-rc6-mm1/fs/reiserfs/inode.c2007-04-09 10:41:47.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiserfs/inode.c   2007-04-09 
18:18:23.0 -0700
@@ -2148,13 +2148,8 @@ int reiserfs_truncate_file(struct inode 
length = offset & (blocksize - 1);
/* if we are not on a block boundary */
if (length) {
-   char *kaddr;
-
length = blocksize - length;
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_us

[PATCH 10/13] reiser4: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it.  Also replace the (mostly)
redundant zero_page() function.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/cryptcompress.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/cryptcompress.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/cryptcompress.c 2007-04-10 
17:15:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/cryptcompress.c
2007-04-10 18:35:44.0 -0700
@@ -1897,7 +1897,6 @@ static int
 write_hole(struct inode *inode, reiser4_cluster_t * clust, loff_t file_off,
   loff_t to_file)
 {
-   char *data;
int result = 0;
unsigned cl_off, cl_count = 0;
unsigned to_pg, pg_off;
@@ -1934,10 +1933,7 @@ write_hole(struct inode *inode, reiser4_
 
to_pg = min_count(PAGE_CACHE_SIZE - pg_off, cl_count);
lock_page(page);
-   data = kmap_atomic(page, KM_USER0);
-   memset(data + pg_off, 0, to_pg);
-   flush_dcache_page(page);
-   kunmap_atomic(data, KM_USER0);
+   zero_user_page(page, pg_off, to_pg);
SetPageUptodate(page);
unlock_page(page);
 
@@ -2167,7 +2163,6 @@ read_some_cluster_pages(struct inode *in
 
if (clust->nr_pages) {
int off;
-   char *data;
struct page * pg;
assert("edward-1419", clust->pages != NULL);
pg = clust->pages[clust->nr_pages - 1];
@@ -2175,10 +2170,7 @@ read_some_cluster_pages(struct inode *in
off = off_to_pgoff(win->off+win->count+win->delta);
if (off) {
lock_page(pg);
-   data = kmap_atomic(pg, KM_USER0);
-   memset(data + off, 0, PAGE_CACHE_SIZE - off);
-   flush_dcache_page(pg);
-   kunmap_atomic(data, KM_USER0);
+   zero_user_page(pg, off, PAGE_CACHE_SIZE - off);
unlock_page(pg);
}
}
@@ -2217,20 +2209,15 @@ read_some_cluster_pages(struct inode *in
(count_to_nrpages(inode->i_size) <= pg->index)) {
/* .. and appended,
   so set zeroes to the rest */
-   char *data;
int offset;
lock_page(pg);
-   data = kmap_atomic(pg, KM_USER0);
-
assert("edward-1260",
   count_to_nrpages(win->off + win->count +
win->delta) - 1 == i);
 
offset =
off_to_pgoff(win->off + win->count + win->delta);
-   memset(data + offset, 0, PAGE_CACHE_SIZE - offset);
-   flush_dcache_page(pg);
-   kunmap_atomic(data, KM_USER0);
+   zero_user_page(pg, offset, PAGE_CACHE_SIZE - offset);
unlock_page(pg);
/* still not uptodate */
break;
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/file.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/file.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/file.c  2007-04-10 
17:15:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/file.c 2007-04-10 
18:35:44.0 -0700
@@ -433,7 +433,6 @@ static int shorten_file(struct inode *in
struct page *page;
int padd_from;
unsigned long index;
-   char *kaddr;
unix_file_info_t *uf_info;
 
/*
@@ -523,10 +522,7 @@ static int shorten_file(struct inode *in
 
lock_page(page);
assert("vs-1066", PageLocked(page));
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + padd_from, 0, PAGE_CACHE_SIZE - padd_from);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, padd_from, PAGE_CACHE_SIZE - padd_from);
unlock_page(page);
page_cache_release(page);
/* the below does up(sbinfo->delete_mutex). Do not get confused */
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/ctail.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/ctail.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/ctail.c 2007-04-10 
17:15:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/ctail.c2007-04-10 
18:35:44.0 -0700
@@ -627,11 +627,7 @@ int do_readpage_ctail(struct inode * ino
 #endif
case FAKE_DISK_CLUSTER:
/* fill the page by zeroes */
-

[PATCH 9/13] ocfs2: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ocfs2/aops.c 
linux-2.6.21-rc6-mm1-test/fs/ocfs2/aops.c
--- linux-2.6.21-rc6-mm1/fs/ocfs2/aops.c2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ocfs2/aops.c   2007-04-09 18:18:23.0 
-0700
@@ -234,10 +234,7 @@ static int ocfs2_readpage(struct file *f
 * XXX sys_readahead() seems to get that wrong?
 */
if (start >= i_size_read(inode)) {
-   char *addr = kmap(page);
-   memset(addr, 0, PAGE_SIZE);
-   flush_dcache_page(page);
-   kunmap(page);
+   zero_user_page(page, 0, PAGE_SIZE);
SetPageUptodate(page);
ret = 0;
goto out_alloc;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/13] xfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of the newly deprecated memclear_highpage_flush(). 

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/xfs/linux-2.6/xfs_lrw.c 
linux-2.6.21-rc6-mm1-test/fs/xfs/linux-2.6/xfs_lrw.c
--- linux-2.6.21-rc6-mm1/fs/xfs/linux-2.6/xfs_lrw.c 2007-04-09 
17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/xfs/linux-2.6/xfs_lrw.c2007-04-09 
18:18:23.0 -0700
@@ -159,7 +159,7 @@ xfs_iozero(
if (status)
goto unlock;
 
-   memclear_highpage_flush(page, offset, bytes);
+   zero_user_page(page, offset, bytes);
 
status = mapping->a_ops->commit_write(NULL, page, offset,
offset + bytes);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/13] fs: deprecate memclear_highpage_flush

2007-04-10 Thread Nate Diller

Now that all the in-tree users are converted over to zero_user_page(),
deprecate the old memclear_highpage_flush() call.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/include/linux/highmem.h 
linux-2.6.21-rc6-mm1-test/include/linux/highmem.h
--- linux-2.6.21-rc6-mm1/include/linux/highmem.h2007-04-10 
18:32:41.0 -0700
+++ linux-2.6.21-rc6-mm1-test/include/linux/highmem.h   2007-04-10 
19:40:14.0 -0700
@@ -149,6 +149,8 @@ static inline void zero_user_page(struct
kunmap_atomic(kaddr, KM_USER0);
 }
 
+static void memclear_highpage_flush(struct page *page, unsigned int offset,
+   unsigned int size) __deprecated;
 static inline void memclear_highpage_flush(struct page *page, unsigned int 
offset, unsigned int size)
 {
return zero_user_page(page, offset, size);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/13] nfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of the newly deprecated memclear_highpage_flush().

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/nfs/read.c 
linux-2.6.21-rc6-mm1-test/fs/nfs/read.c
--- linux-2.6.21-rc6-mm1/fs/nfs/read.c  2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/nfs/read.c 2007-04-09 18:18:23.0 
-0700
@@ -79,7 +79,7 @@ void nfs_readdata_release(void *data)
 static
 int nfs_return_empty_page(struct page *page)
 {
-   memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE);
+   zero_user_page(page, 0, PAGE_CACHE_SIZE);
SetPageUptodate(page);
unlock_page(page);
return 0;
@@ -103,10 +103,10 @@ static void nfs_readpage_truncate_uninit
pglen = PAGE_CACHE_SIZE - base;
for (;;) {
if (remainder <= pglen) {
-   memclear_highpage_flush(*pages, base, remainder);
+   zero_user_page(*pages, base, remainder);
break;
}
-   memclear_highpage_flush(*pages, base, pglen);
+   zero_user_page(*pages, base, pglen);
pages++;
remainder -= pglen;
pglen = PAGE_CACHE_SIZE;
@@ -130,7 +130,7 @@ static int nfs_readpage_async(struct nfs
return PTR_ERR(new);
}
if (len < PAGE_CACHE_SIZE)
-   memclear_highpage_flush(page, len, PAGE_CACHE_SIZE - len);
+   zero_user_page(page, len, PAGE_CACHE_SIZE - len);
 
nfs_list_add_request(new, _request);
nfs_pagein_one(_request, inode);
@@ -561,7 +561,7 @@ readpage_async_filler(void *data, struct
return PTR_ERR(new);
}
if (len < PAGE_CACHE_SIZE)
-   memclear_highpage_flush(page, len, PAGE_CACHE_SIZE - len);
+   zero_user_page(page, len, PAGE_CACHE_SIZE - len);
nfs_list_add_request(new, desc->head);
return 0;
 }
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/nfs/write.c 
linux-2.6.21-rc6-mm1-test/fs/nfs/write.c
--- linux-2.6.21-rc6-mm1/fs/nfs/write.c 2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/nfs/write.c2007-04-09 18:18:23.0 
-0700
@@ -169,7 +169,7 @@ static void nfs_mark_uptodate(struct pag
if (count != nfs_page_length(page))
return;
if (count != PAGE_CACHE_SIZE)
-   memclear_highpage_flush(page, count, PAGE_CACHE_SIZE - count);
+   zero_user_page(page, count, PAGE_CACHE_SIZE - count);
SetPageUptodate(page);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/13] ecryptfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ecryptfs/mmap.c 
linux-2.6.21-rc6-mm1-test/fs/ecryptfs/mmap.c
--- linux-2.6.21-rc6-mm1/fs/ecryptfs/mmap.c 2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ecryptfs/mmap.c2007-04-09 
18:19:34.0 -0700
@@ -364,18 +364,14 @@ static int fill_zeros_to_end_of_page(str
 {
struct inode *inode = page->mapping->host;
int end_byte_in_page;
-   char *page_virt;
 
if ((i_size_read(inode) / PAGE_CACHE_SIZE) != page->index)
goto out;
end_byte_in_page = i_size_read(inode) % PAGE_CACHE_SIZE;
if (to > end_byte_in_page)
end_byte_in_page = to;
-   page_virt = kmap_atomic(page, KM_USER0);
-   memset((page_virt + end_byte_in_page), 0,
-  (PAGE_CACHE_SIZE - end_byte_in_page));
-   kunmap_atomic(page_virt, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, end_byte_in_page,
+   PAGE_CACHE_SIZE - end_byte_in_page);
 out:
return 0;
 }
@@ -740,7 +736,6 @@ int write_zeros(struct file *file, pgoff
 {
int rc = 0;
struct page *tmp_page;
-   char *tmp_page_virt;
 
tmp_page = ecryptfs_get1page(file, index);
if (IS_ERR(tmp_page)) {
@@ -757,10 +752,7 @@ int write_zeros(struct file *file, pgoff
page_cache_release(tmp_page);
goto out;
}
-   tmp_page_virt = kmap_atomic(tmp_page, KM_USER0);
-   memset(((char *)tmp_page_virt + start), 0, num_zeros);
-   kunmap_atomic(tmp_page_virt, KM_USER0);
-   flush_dcache_page(tmp_page);
+   zero_user_page(tmp_page, start, num_zeros);
rc = ecryptfs_commit_write(file, tmp_page, start, start + num_zeros);
if (rc < 0) {
ecryptfs_printk(KERN_ERR, "Error attempting to write zero's "
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/13] fs: convert core functions to zero_user_page

2007-04-10 Thread Nate Diller

It's very common for file systems to need to zero part or all of a page, the
simplist way is just to use kmap_atomic() and memset().  There's actually a
library function in include/linux/highmem.h that does exactly that, but it's
confusingly named memclear_highpage_flush(), which is descriptive of *how*
it does the work rather than what the *purpose* is.  So this patchset
renames the function to zero_user_page(), and calls it from the various
places that currently open code it.

This first patch introduces the new function call, and converts all the core
kernel callsites, both the open-coded ones and the old
memclear_highpage_flush() ones.  Following this patch is a series of
conversions for each file system individually, per AKPM, and finally a patch
deprecating the old call.  The diffstat below shows the entire patchset.

Compile tested in x86_64.

signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

 drivers/block/loop.c |6 ---
 fs/affs/file.c   |6 ---
 fs/buffer.c  |   53 +--
 fs/direct-io.c   |8 +---
 fs/ecryptfs/mmap.c   |   14 +---
 fs/ext3/inode.c  |   12 +--
 fs/ext4/inode.c  |   12 +--
 fs/ext4/writeback.c  |   12 +--
 fs/gfs2/bmap.c   |6 ---
 fs/mpage.c   |   11 +-
 fs/nfs/read.c|   10 ++---
 fs/nfs/write.c   |2 -
 fs/ntfs/aops.c   |   26 ++-
 fs/ntfs/file.c   |   47 +--
 fs/ocfs2/aops.c  |5 --
 fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
 fs/reiser4/plugin/file/file.c|6 ---
 fs/reiser4/plugin/item/ctail.c   |6 ---
 fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
 fs/reiser4/plugin/item/tail.c|8 +---
 fs/reiserfs/file.c   |   39 ++
 fs/reiserfs/inode.c  |   13 +--
 fs/xfs/linux-2.6/xfs_lrw.c   |2 -
 include/linux/highmem.h  |7 +++-
 mm/filemap_xip.c |7 
 mm/truncate.c|2 -
 26 files changed, 82 insertions(+), 276 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/drivers/block/loop.c 
linux-2.6.21-rc6-mm1-test/drivers/block/loop.c
--- linux-2.6.21-rc6-mm1/drivers/block/loop.c   2007-04-10 18:27:04.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/drivers/block/loop.c  2007-04-10 
18:18:16.0 -0700
@@ -244,17 +244,13 @@ static int do_lo_send_aops(struct loop_d
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec->bv_page, bv_offs, size, IV);
if (unlikely(transfer_result)) {
-   char *kaddr;
-
/*
 * The transfer failed, but we still write the data to
 * keep prepare/commit calls balanced.
 */
printk(KERN_ERR "loop: transfer error block %llu\n",
   (unsigned long long)index);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, size);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, size);
}
flush_dcache_page(page);
ret = aops->commit_write(file, page, offset,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/buffer.c 
linux-2.6.21-rc6-mm1-test/fs/buffer.c
--- linux-2.6.21-rc6-mm1/fs/buffer.c2007-04-10 18:27:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/buffer.c   2007-04-10 18:18:16.0 
-0700
@@ -1862,13 +1862,8 @@ static int __block_prepare_write(struct 
if (block_start >= to)
break;
if (buffer_new(bh)) {
-   void *kaddr;
-
clear_buffer_new(bh);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr+block_start, 0, bh->b_size);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, block_start, bh->b_size);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
}
@@ -1956,10 +1951,7 @@ int block_read_full_page(struct page *pa
SetPageError(page);
}
if (!buffer_mapped(bh)) {
-   void *kaddr = kmap_atomic(page, KM_USER0);
-

Re: [PATCH 1/2] fs: use memclear_highpage_flush to zero page data

2007-04-10 Thread Nate Diller


On 4/10/07, Anton Altaparmakov <[EMAIL PROTECTED]> wrote:

On 10 Apr 2007, at 07:10, Andrew Morton wrote:
> On Mon, 09 Apr 2007 21:31:37 -0700 Nate Diller
> <[EMAIL PROTECTED]> wrote:
>> It's very common for file systems to need to zero part or all of a
>> page, the
>> simplist way is just to use kmap_atomic() and memset().  There's
>> actually a
>> library function in include/linux/highmem.h that does exactly
>> that, but it's
>> confusingly named memclear_highpage_flush(), which is descriptive
>> of *how*
>> it does the work rather than what the *purpose* is.  So this patch
>> renames
>> the function to zero_page_data(), and calls it from the various
>> places that
>> currently open code it.
>>
>> Compile tested in x86_64.
>>
>> signed-off-by: Nate Diller <[EMAIL PROTECTED]>
>>
>> ---
>>
>>  drivers/block/loop.c |6 ---
>>  fs/affs/file.c   |6 ---
>>  fs/buffer.c  |   53 
>> +--
>>  fs/direct-io.c   |8 +---
>>  fs/ecryptfs/mmap.c   |   14 +---
>>  fs/ext3/inode.c  |   12 +--
>>  fs/ext4/inode.c  |   12 +--
>>  fs/ext4/writeback.c  |   12 +--
>>  fs/gfs2/bmap.c   |6 ---
>>  fs/mpage.c   |   11 +-
>>  fs/nfs/read.c|   10 ++---
>>  fs/nfs/write.c   |2 -
>>  fs/ntfs/aops.c   |   32 +++---
>>  fs/ntfs/file.c   |   47 
>> +--
>>  fs/ocfs2/aops.c  |5 --
>>  fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
>>  fs/reiser4/plugin/file/file.c|6 ---
>>  fs/reiser4/plugin/item/ctail.c   |6 ---
>>  fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
>>  fs/reiser4/plugin/item/tail.c|8 +---
>>  fs/reiserfs/file.c   |   39 +
>> +
>>  fs/reiserfs/inode.c  |   13 +--
>>  fs/xfs/linux-2.6/xfs_lrw.c   |2 -
>>  include/linux/highmem.h  |2 -
>>  mm/filemap_xip.c |7 
>>  mm/truncate.c|2 -
>>  26 files changed, 78 insertions(+), 281 deletions(-)
>>
>
> Not sure that I agree with the name zero_page_data().  People might
> use it
> to, err, zero a page's data.  Whereas it is really only for use
> against
> *user* pages.   zero_user_page(), perhaps.
>
> Plus..
>
> This patch as presented causes me surprising amounts of trouble.  I
> need to
> split it up into
>
>   - core plus filesystems which don't have maintainers (for me to
> merge)
>
>   - filesystems which do have maintainers (one patch per), for
> maintainers to merge.
>
>   - another patch for reiser4, to remain in -mm.
>
> And this is actually not possible to do, because my merge and the
> subsystem
> maintainers' merges will happen at different times.  In the
> intervening
> window, the kernel won't compile.
>
> So instead I need to
>
>   - split off the reiser4 bit
>
>   - get acks from fs maintainers on the rest
>
>   - merge the whole thing in one hit (minus reiser4)
>
> And I can do that, but it is the less preferable option.
>
>
> The better way to do this merge is:
>
> patch #1:
>
> static inline void memclear_highpage_flush(...) __deprecated
> {
>   zero_user_page(...);
> }
>
> patch #2..n:  convert filesystems.
>
>
> then, when all filesystems are converted, we're ready to remove
> memclear_highpage_flush().  But we do that six months later - let's
> not
> screw out-of-tree fs maintainers (and their users) unnecessarily.

Nate, I think you either do not understand what the KM_* constants
passed to kmap_atomic() mean or you were overeager in your code
replacement...  You really, really cannot replace KM_BIO_SRC_IRQ with
KM_USER0 in the NTFS i/o completion handler without trashing people's
data left right an centre!


good catch, I was indeed careless on that one.  I just double checked
all the other changes and that was the only non-KM_USER0 that slipped
through.  Thanks!

I will submit a new patch later today that fixes this problem and the
issues AKPM raised.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] fs: use memclear_highpage_flush to zero page data

2007-04-10 Thread Nate Diller


On 4/10/07, Anton Altaparmakov [EMAIL PROTECTED] wrote:

On 10 Apr 2007, at 07:10, Andrew Morton wrote:
 On Mon, 09 Apr 2007 21:31:37 -0700 Nate Diller
 [EMAIL PROTECTED] wrote:
 It's very common for file systems to need to zero part or all of a
 page, the
 simplist way is just to use kmap_atomic() and memset().  There's
 actually a
 library function in include/linux/highmem.h that does exactly
 that, but it's
 confusingly named memclear_highpage_flush(), which is descriptive
 of *how*
 it does the work rather than what the *purpose* is.  So this patch
 renames
 the function to zero_page_data(), and calls it from the various
 places that
 currently open code it.

 Compile tested in x86_64.

 signed-off-by: Nate Diller [EMAIL PROTECTED]

 ---

  drivers/block/loop.c |6 ---
  fs/affs/file.c   |6 ---
  fs/buffer.c  |   53 
 +--
  fs/direct-io.c   |8 +---
  fs/ecryptfs/mmap.c   |   14 +---
  fs/ext3/inode.c  |   12 +--
  fs/ext4/inode.c  |   12 +--
  fs/ext4/writeback.c  |   12 +--
  fs/gfs2/bmap.c   |6 ---
  fs/mpage.c   |   11 +-
  fs/nfs/read.c|   10 ++---
  fs/nfs/write.c   |2 -
  fs/ntfs/aops.c   |   32 +++---
  fs/ntfs/file.c   |   47 
 +--
  fs/ocfs2/aops.c  |5 --
  fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
  fs/reiser4/plugin/file/file.c|6 ---
  fs/reiser4/plugin/item/ctail.c   |6 ---
  fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
  fs/reiser4/plugin/item/tail.c|8 +---
  fs/reiserfs/file.c   |   39 +
 +
  fs/reiserfs/inode.c  |   13 +--
  fs/xfs/linux-2.6/xfs_lrw.c   |2 -
  include/linux/highmem.h  |2 -
  mm/filemap_xip.c |7 
  mm/truncate.c|2 -
  26 files changed, 78 insertions(+), 281 deletions(-)


 Not sure that I agree with the name zero_page_data().  People might
 use it
 to, err, zero a page's data.  Whereas it is really only for use
 against
 *user* pages.   zero_user_page(), perhaps.

 Plus..

 This patch as presented causes me surprising amounts of trouble.  I
 need to
 split it up into

   - core plus filesystems which don't have maintainers (for me to
 merge)

   - filesystems which do have maintainers (one patch per), for
 maintainers to merge.

   - another patch for reiser4, to remain in -mm.

 And this is actually not possible to do, because my merge and the
 subsystem
 maintainers' merges will happen at different times.  In the
 intervening
 window, the kernel won't compile.

 So instead I need to

   - split off the reiser4 bit

   - get acks from fs maintainers on the rest

   - merge the whole thing in one hit (minus reiser4)

 And I can do that, but it is the less preferable option.


 The better way to do this merge is:

 patch #1:

 static inline void memclear_highpage_flush(...) __deprecated
 {
   zero_user_page(...);
 }

 patch #2..n:  convert filesystems.


 then, when all filesystems are converted, we're ready to remove
 memclear_highpage_flush().  But we do that six months later - let's
 not
 screw out-of-tree fs maintainers (and their users) unnecessarily.

Nate, I think you either do not understand what the KM_* constants
passed to kmap_atomic() mean or you were overeager in your code
replacement...  You really, really cannot replace KM_BIO_SRC_IRQ with
KM_USER0 in the NTFS i/o completion handler without trashing people's
data left right an centre!


good catch, I was indeed careless on that one.  I just double checked
all the other changes and that was the only non-KM_USER0 that slipped
through.  Thanks!

I will submit a new patch later today that fixes this problem and the
issues AKPM raised.

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/13] fs: convert core functions to zero_user_page

2007-04-10 Thread Nate Diller

It's very common for file systems to need to zero part or all of a page, the
simplist way is just to use kmap_atomic() and memset().  There's actually a
library function in include/linux/highmem.h that does exactly that, but it's
confusingly named memclear_highpage_flush(), which is descriptive of *how*
it does the work rather than what the *purpose* is.  So this patchset
renames the function to zero_user_page(), and calls it from the various
places that currently open code it.

This first patch introduces the new function call, and converts all the core
kernel callsites, both the open-coded ones and the old
memclear_highpage_flush() ones.  Following this patch is a series of
conversions for each file system individually, per AKPM, and finally a patch
deprecating the old call.  The diffstat below shows the entire patchset.

Compile tested in x86_64.

signed-off-by: Nate Diller [EMAIL PROTECTED]

---

 drivers/block/loop.c |6 ---
 fs/affs/file.c   |6 ---
 fs/buffer.c  |   53 +--
 fs/direct-io.c   |8 +---
 fs/ecryptfs/mmap.c   |   14 +---
 fs/ext3/inode.c  |   12 +--
 fs/ext4/inode.c  |   12 +--
 fs/ext4/writeback.c  |   12 +--
 fs/gfs2/bmap.c   |6 ---
 fs/mpage.c   |   11 +-
 fs/nfs/read.c|   10 ++---
 fs/nfs/write.c   |2 -
 fs/ntfs/aops.c   |   26 ++-
 fs/ntfs/file.c   |   47 +--
 fs/ocfs2/aops.c  |5 --
 fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
 fs/reiser4/plugin/file/file.c|6 ---
 fs/reiser4/plugin/item/ctail.c   |6 ---
 fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
 fs/reiser4/plugin/item/tail.c|8 +---
 fs/reiserfs/file.c   |   39 ++
 fs/reiserfs/inode.c  |   13 +--
 fs/xfs/linux-2.6/xfs_lrw.c   |2 -
 include/linux/highmem.h  |7 +++-
 mm/filemap_xip.c |7 
 mm/truncate.c|2 -
 26 files changed, 82 insertions(+), 276 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/drivers/block/loop.c 
linux-2.6.21-rc6-mm1-test/drivers/block/loop.c
--- linux-2.6.21-rc6-mm1/drivers/block/loop.c   2007-04-10 18:27:04.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/drivers/block/loop.c  2007-04-10 
18:18:16.0 -0700
@@ -244,17 +244,13 @@ static int do_lo_send_aops(struct loop_d
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec-bv_page, bv_offs, size, IV);
if (unlikely(transfer_result)) {
-   char *kaddr;
-
/*
 * The transfer failed, but we still write the data to
 * keep prepare/commit calls balanced.
 */
printk(KERN_ERR loop: transfer error block %llu\n,
   (unsigned long long)index);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, size);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, size);
}
flush_dcache_page(page);
ret = aops-commit_write(file, page, offset,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/buffer.c 
linux-2.6.21-rc6-mm1-test/fs/buffer.c
--- linux-2.6.21-rc6-mm1/fs/buffer.c2007-04-10 18:27:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/buffer.c   2007-04-10 18:18:16.0 
-0700
@@ -1862,13 +1862,8 @@ static int __block_prepare_write(struct 
if (block_start = to)
break;
if (buffer_new(bh)) {
-   void *kaddr;
-
clear_buffer_new(bh);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr+block_start, 0, bh-b_size);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, block_start, bh-b_size);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
}
@@ -1956,10 +1951,7 @@ int block_read_full_page(struct page *pa
SetPageError(page);
}
if (!buffer_mapped(bh)) {
-   void *kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + i

[PATCH 3/13] ecryptfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ecryptfs/mmap.c 
linux-2.6.21-rc6-mm1-test/fs/ecryptfs/mmap.c
--- linux-2.6.21-rc6-mm1/fs/ecryptfs/mmap.c 2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ecryptfs/mmap.c2007-04-09 
18:19:34.0 -0700
@@ -364,18 +364,14 @@ static int fill_zeros_to_end_of_page(str
 {
struct inode *inode = page-mapping-host;
int end_byte_in_page;
-   char *page_virt;
 
if ((i_size_read(inode) / PAGE_CACHE_SIZE) != page-index)
goto out;
end_byte_in_page = i_size_read(inode) % PAGE_CACHE_SIZE;
if (to  end_byte_in_page)
end_byte_in_page = to;
-   page_virt = kmap_atomic(page, KM_USER0);
-   memset((page_virt + end_byte_in_page), 0,
-  (PAGE_CACHE_SIZE - end_byte_in_page));
-   kunmap_atomic(page_virt, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, end_byte_in_page,
+   PAGE_CACHE_SIZE - end_byte_in_page);
 out:
return 0;
 }
@@ -740,7 +736,6 @@ int write_zeros(struct file *file, pgoff
 {
int rc = 0;
struct page *tmp_page;
-   char *tmp_page_virt;
 
tmp_page = ecryptfs_get1page(file, index);
if (IS_ERR(tmp_page)) {
@@ -757,10 +752,7 @@ int write_zeros(struct file *file, pgoff
page_cache_release(tmp_page);
goto out;
}
-   tmp_page_virt = kmap_atomic(tmp_page, KM_USER0);
-   memset(((char *)tmp_page_virt + start), 0, num_zeros);
-   kunmap_atomic(tmp_page_virt, KM_USER0);
-   flush_dcache_page(tmp_page);
+   zero_user_page(tmp_page, start, num_zeros);
rc = ecryptfs_commit_write(file, tmp_page, start, start + num_zeros);
if (rc  0) {
ecryptfs_printk(KERN_ERR, Error attempting to write zero's 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/13] nfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of the newly deprecated memclear_highpage_flush().

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/nfs/read.c 
linux-2.6.21-rc6-mm1-test/fs/nfs/read.c
--- linux-2.6.21-rc6-mm1/fs/nfs/read.c  2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/nfs/read.c 2007-04-09 18:18:23.0 
-0700
@@ -79,7 +79,7 @@ void nfs_readdata_release(void *data)
 static
 int nfs_return_empty_page(struct page *page)
 {
-   memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE);
+   zero_user_page(page, 0, PAGE_CACHE_SIZE);
SetPageUptodate(page);
unlock_page(page);
return 0;
@@ -103,10 +103,10 @@ static void nfs_readpage_truncate_uninit
pglen = PAGE_CACHE_SIZE - base;
for (;;) {
if (remainder = pglen) {
-   memclear_highpage_flush(*pages, base, remainder);
+   zero_user_page(*pages, base, remainder);
break;
}
-   memclear_highpage_flush(*pages, base, pglen);
+   zero_user_page(*pages, base, pglen);
pages++;
remainder -= pglen;
pglen = PAGE_CACHE_SIZE;
@@ -130,7 +130,7 @@ static int nfs_readpage_async(struct nfs
return PTR_ERR(new);
}
if (len  PAGE_CACHE_SIZE)
-   memclear_highpage_flush(page, len, PAGE_CACHE_SIZE - len);
+   zero_user_page(page, len, PAGE_CACHE_SIZE - len);
 
nfs_list_add_request(new, one_request);
nfs_pagein_one(one_request, inode);
@@ -561,7 +561,7 @@ readpage_async_filler(void *data, struct
return PTR_ERR(new);
}
if (len  PAGE_CACHE_SIZE)
-   memclear_highpage_flush(page, len, PAGE_CACHE_SIZE - len);
+   zero_user_page(page, len, PAGE_CACHE_SIZE - len);
nfs_list_add_request(new, desc-head);
return 0;
 }
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/nfs/write.c 
linux-2.6.21-rc6-mm1-test/fs/nfs/write.c
--- linux-2.6.21-rc6-mm1/fs/nfs/write.c 2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/nfs/write.c2007-04-09 18:18:23.0 
-0700
@@ -169,7 +169,7 @@ static void nfs_mark_uptodate(struct pag
if (count != nfs_page_length(page))
return;
if (count != PAGE_CACHE_SIZE)
-   memclear_highpage_flush(page, count, PAGE_CACHE_SIZE - count);
+   zero_user_page(page, count, PAGE_CACHE_SIZE - count);
SetPageUptodate(page);
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/13] reiser4: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it.  Also replace the (mostly)
redundant zero_page() function.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/cryptcompress.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/cryptcompress.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/cryptcompress.c 2007-04-10 
17:15:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/cryptcompress.c
2007-04-10 18:35:44.0 -0700
@@ -1897,7 +1897,6 @@ static int
 write_hole(struct inode *inode, reiser4_cluster_t * clust, loff_t file_off,
   loff_t to_file)
 {
-   char *data;
int result = 0;
unsigned cl_off, cl_count = 0;
unsigned to_pg, pg_off;
@@ -1934,10 +1933,7 @@ write_hole(struct inode *inode, reiser4_
 
to_pg = min_count(PAGE_CACHE_SIZE - pg_off, cl_count);
lock_page(page);
-   data = kmap_atomic(page, KM_USER0);
-   memset(data + pg_off, 0, to_pg);
-   flush_dcache_page(page);
-   kunmap_atomic(data, KM_USER0);
+   zero_user_page(page, pg_off, to_pg);
SetPageUptodate(page);
unlock_page(page);
 
@@ -2167,7 +2163,6 @@ read_some_cluster_pages(struct inode *in
 
if (clust-nr_pages) {
int off;
-   char *data;
struct page * pg;
assert(edward-1419, clust-pages != NULL);
pg = clust-pages[clust-nr_pages - 1];
@@ -2175,10 +2170,7 @@ read_some_cluster_pages(struct inode *in
off = off_to_pgoff(win-off+win-count+win-delta);
if (off) {
lock_page(pg);
-   data = kmap_atomic(pg, KM_USER0);
-   memset(data + off, 0, PAGE_CACHE_SIZE - off);
-   flush_dcache_page(pg);
-   kunmap_atomic(data, KM_USER0);
+   zero_user_page(pg, off, PAGE_CACHE_SIZE - off);
unlock_page(pg);
}
}
@@ -2217,20 +2209,15 @@ read_some_cluster_pages(struct inode *in
(count_to_nrpages(inode-i_size) = pg-index)) {
/* .. and appended,
   so set zeroes to the rest */
-   char *data;
int offset;
lock_page(pg);
-   data = kmap_atomic(pg, KM_USER0);
-
assert(edward-1260,
   count_to_nrpages(win-off + win-count +
win-delta) - 1 == i);
 
offset =
off_to_pgoff(win-off + win-count + win-delta);
-   memset(data + offset, 0, PAGE_CACHE_SIZE - offset);
-   flush_dcache_page(pg);
-   kunmap_atomic(data, KM_USER0);
+   zero_user_page(pg, offset, PAGE_CACHE_SIZE - offset);
unlock_page(pg);
/* still not uptodate */
break;
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/file.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/file.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/file.c  2007-04-10 
17:15:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/file.c 2007-04-10 
18:35:44.0 -0700
@@ -433,7 +433,6 @@ static int shorten_file(struct inode *in
struct page *page;
int padd_from;
unsigned long index;
-   char *kaddr;
unix_file_info_t *uf_info;
 
/*
@@ -523,10 +522,7 @@ static int shorten_file(struct inode *in
 
lock_page(page);
assert(vs-1066, PageLocked(page));
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + padd_from, 0, PAGE_CACHE_SIZE - padd_from);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, padd_from, PAGE_CACHE_SIZE - padd_from);
unlock_page(page);
page_cache_release(page);
/* the below does up(sbinfo-delete_mutex). Do not get confused */
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/ctail.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/ctail.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/ctail.c 2007-04-10 
17:15:04.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/ctail.c2007-04-10 
18:35:44.0 -0700
@@ -627,11 +627,7 @@ int do_readpage_ctail(struct inode * ino
 #endif
case FAKE_DISK_CLUSTER:
/* fill the page by zeroes */
-   data = kmap_atomic(page, KM_USER0);
-
-   memset(data, 0, PAGE_CACHE_SIZE

[PATCH 9/13] ocfs2: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ocfs2/aops.c 
linux-2.6.21-rc6-mm1-test/fs/ocfs2/aops.c
--- linux-2.6.21-rc6-mm1/fs/ocfs2/aops.c2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ocfs2/aops.c   2007-04-09 18:18:23.0 
-0700
@@ -234,10 +234,7 @@ static int ocfs2_readpage(struct file *f
 * XXX sys_readahead() seems to get that wrong?
 */
if (start = i_size_read(inode)) {
-   char *addr = kmap(page);
-   memset(addr, 0, PAGE_SIZE);
-   flush_dcache_page(page);
-   kunmap(page);
+   zero_user_page(page, 0, PAGE_SIZE);
SetPageUptodate(page);
ret = 0;
goto out_alloc;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/13] xfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of the newly deprecated memclear_highpage_flush(). 

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/xfs/linux-2.6/xfs_lrw.c 
linux-2.6.21-rc6-mm1-test/fs/xfs/linux-2.6/xfs_lrw.c
--- linux-2.6.21-rc6-mm1/fs/xfs/linux-2.6/xfs_lrw.c 2007-04-09 
17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/xfs/linux-2.6/xfs_lrw.c2007-04-09 
18:18:23.0 -0700
@@ -159,7 +159,7 @@ xfs_iozero(
if (status)
goto unlock;
 
-   memclear_highpage_flush(page, offset, bytes);
+   zero_user_page(page, offset, bytes);
 
status = mapping-a_ops-commit_write(NULL, page, offset,
offset + bytes);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/13] fs: deprecate memclear_highpage_flush

2007-04-10 Thread Nate Diller

Now that all the in-tree users are converted over to zero_user_page(),
deprecate the old memclear_highpage_flush() call.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/include/linux/highmem.h 
linux-2.6.21-rc6-mm1-test/include/linux/highmem.h
--- linux-2.6.21-rc6-mm1/include/linux/highmem.h2007-04-10 
18:32:41.0 -0700
+++ linux-2.6.21-rc6-mm1-test/include/linux/highmem.h   2007-04-10 
19:40:14.0 -0700
@@ -149,6 +149,8 @@ static inline void zero_user_page(struct
kunmap_atomic(kaddr, KM_USER0);
 }
 
+static void memclear_highpage_flush(struct page *page, unsigned int offset,
+   unsigned int size) __deprecated;
 static inline void memclear_highpage_flush(struct page *page, unsigned int 
offset, unsigned int size)
 {
return zero_user_page(page, offset, size);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/13] ext3: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext3/inode.c 
linux-2.6.21-rc6-mm1-test/fs/ext3/inode.c
--- linux-2.6.21-rc6-mm1/fs/ext3/inode.c2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext3/inode.c   2007-04-09 18:18:23.0 
-0700
@@ -1767,7 +1767,6 @@ static int ext3_block_truncate_page(hand
struct inode *inode = mapping-host;
struct buffer_head *bh;
int err = 0;
-   void *kaddr;
 
blocksize = inode-i_sb-s_blocksize;
length = blocksize - (offset  (blocksize - 1));
@@ -1779,10 +1778,7 @@ static int ext3_block_truncate_page(hand
 */
if (!page_has_buffers(page)  test_opt(inode-i_sb, NOBH) 
 ext3_should_writeback_data(inode)  PageUptodate(page)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
set_page_dirty(page);
goto unlock;
}
@@ -1835,11 +1831,7 @@ static int ext3_block_truncate_page(hand
goto unlock;
}
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-
+   zero_user_page(page, offset, length);
BUFFER_TRACE(bh, zeroed end of block);
 
err = 0;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/13] gfs2: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/gfs2/bmap.c 
linux-2.6.21-rc6-mm1-test/fs/gfs2/bmap.c
--- linux-2.6.21-rc6-mm1/fs/gfs2/bmap.c 2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/gfs2/bmap.c2007-04-09 18:18:23.0 
-0700
@@ -885,7 +885,6 @@ static int gfs2_block_truncate_page(stru
unsigned blocksize, iblock, length, pos;
struct buffer_head *bh;
struct page *page;
-   void *kaddr;
int err;
 
page = grab_cache_page(mapping, index);
@@ -933,10 +932,7 @@ static int gfs2_block_truncate_page(stru
if (sdp-sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip))
gfs2_trans_add_bh(ip-i_gl, bh, 0);
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
 
 unlock:
unlock_page(page);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/13] reiserfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiserfs/file.c 
linux-2.6.21-rc6-mm1-test/fs/reiserfs/file.c
--- linux-2.6.21-rc6-mm1/fs/reiserfs/file.c 2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiserfs/file.c2007-04-09 
18:18:23.0 -0700
@@ -1059,20 +1059,12 @@ static int reiserfs_prepare_file_region_
   maping blocks, since there is none, so we just zero out remaining
   parts of first and last pages in write area (if needed) */
if ((pos  ~((loff_t) PAGE_CACHE_SIZE - 1))  inode-i_size) {
-   if (from != 0) {/* First page needs to be partially 
zeroed */
-   char *kaddr = kmap_atomic(prepared_pages[0], KM_USER0);
-   memset(kaddr, 0, from);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(prepared_pages[0]);
-   }
-   if (to != PAGE_CACHE_SIZE) {/* Last page needs to be 
partially zeroed */
-   char *kaddr =
-   kmap_atomic(prepared_pages[num_pages - 1],
-   KM_USER0);
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(prepared_pages[num_pages - 1]);
-   }
+   if (from != 0)  /* First page needs to be partially 
zeroed */
+   zero_user_page(prepared_pages[0], 0, from);
+
+   if (to != PAGE_CACHE_SIZE)  /* Last page needs to be 
partially zeroed */
+   zero_user_page(prepared_pages[num_pages-1], to,
+   PAGE_CACHE_SIZE - to);
 
/* Since all blocks are new - use already calculated value */
return blocks;
@@ -1199,13 +1191,9 @@ static int reiserfs_prepare_file_region_
ll_rw_block(READ, 1, bh);
*wait_bh++ = bh;
} else {/* Not mapped, zero it */
-   char *kaddr =
-   kmap_atomic(prepared_pages[0],
-   KM_USER0);
-   memset(kaddr + block_start, 0,
-  from - block_start);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(prepared_pages[0]);
+   zero_user_page(prepared_pages[0],
+  block_start,
+  from - block_start);
set_buffer_uptodate(bh);
}
}
@@ -1237,13 +1225,8 @@ static int reiserfs_prepare_file_region_
ll_rw_block(READ, 1, bh);
*wait_bh++ = bh;
} else {/* Not mapped, zero it */
-   char *kaddr =
-   kmap_atomic(prepared_pages
-   [num_pages - 1],
-   KM_USER0);
-   memset(kaddr + to, 0, block_end - to);
-   kunmap_atomic(kaddr, KM_USER0);
-   
flush_dcache_page(prepared_pages[num_pages - 1]);
+   
zero_user_page(prepared_pages[num_pages-1],
+   to, block_end - to);
set_buffer_uptodate(bh);
}
}
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiserfs/inode.c 
linux-2.6.21-rc6-mm1-test/fs/reiserfs/inode.c
--- linux-2.6.21-rc6-mm1/fs/reiserfs/inode.c2007-04-09 10:41:47.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiserfs/inode.c   2007-04-09 
18:18:23.0 -0700
@@ -2148,13 +2148,8 @@ int reiserfs_truncate_file(struct inode 
length = offset  (blocksize - 1);
/* if we are not on a block boundary */
if (length) {
-   char *kaddr;
-
length = blocksize - length;
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length

[PATCH 8/13] ntfs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ntfs/aops.c 
linux-2.6.21-rc6-mm1-test/fs/ntfs/aops.c
--- linux-2.6.21-rc6-mm1/fs/ntfs/aops.c 2007-04-09 10:41:47.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/ntfs/aops.c2007-04-09 18:18:23.0 
-0700
@@ -245,8 +241,7 @@ static int ntfs_read_block(struct page *
rl = NULL;
nr = i = 0;
do {
-   u8 *kaddr;
-   int err;
+   int err = 0;
 
if (unlikely(buffer_uptodate(bh)))
continue;
@@ -254,7 +249,6 @@ static int ntfs_read_block(struct page *
arr[nr++] = bh;
continue;
}
-   err = 0;
bh-b_bdev = vol-sb-s_bdev;
/* Is the block within the allowed limits? */
if (iblock  lblock) {
@@ -340,10 +334,7 @@ handle_hole:
bh-b_blocknr = -1UL;
clear_buffer_mapped(bh);
 handle_zblock:
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + i * blocksize, 0, blocksize);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, i * blocksize, blocksize);
if (likely(!err))
set_buffer_uptodate(bh);
} while (i++, iblock++, (bh = bh-b_this_page) != head);
@@ -460,10 +451,7 @@ retry_readpage:
 * ok to ignore the compressed flag here.
 */
if (unlikely(page-index  0)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr, 0, PAGE_CACHE_SIZE);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, 0, PAGE_CACHE_SIZE);
goto done;
}
if (!NInoAttr(ni))
@@ -790,14 +778,9 @@ lock_retry_remap:
 * uptodate so it can get discarded by the VM.
 */
if (err == -ENOENT || lcn == LCN_ENOENT) {
-   u8 *kaddr;
-
bh-b_blocknr = -1;
clear_buffer_dirty(bh);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + bh_offset(bh), 0, blocksize);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, bh_offset(bh), blocksize);
set_buffer_uptodate(bh);
err = 0;
continue;
@@ -1422,10 +1405,7 @@ retry_writepage:
if (page-index = (i_size  PAGE_CACHE_SHIFT)) {
/* The page straddles i_size. */
unsigned int ofs = i_size  ~PAGE_CACHE_MASK;
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + ofs, 0, PAGE_CACHE_SIZE - ofs);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, ofs, PAGE_CACHE_SIZE - ofs);
}
/* Handle mst protected attributes. */
if (NInoMstProtected(ni))
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ntfs/file.c 
linux-2.6.21-rc6-mm1-test/fs/ntfs/file.c
--- linux-2.6.21-rc6-mm1/fs/ntfs/file.c 2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/ntfs/file.c2007-04-09 18:18:23.0 
-0700
@@ -606,11 +606,8 @@ do_next_page:
ntfs_submit_bh_for_read(bh);
*wait_bh++ = bh;
} else {
-   u8 *kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + bh_offset(bh), 0,
+   zero_user_page(page, bh_offset(bh),
blocksize);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
set_buffer_uptodate(bh);
}
}
@@ -685,12 +682,8 @@ map_buffer_cached:
ntfs_submit_bh_for_read(bh);
*wait_bh++ = bh;
} else {
-   u8 *kaddr = kmap_atomic(page,
-   KM_USER0);
-   memset(kaddr + bh_offset(bh),
-   0, blocksize);
-   kunmap_atomic(kaddr, KM_USER0

[PATCH 2/13] affs: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller [EMAIL PROTECTED]

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/affs/file.c 
linux-2.6.21-rc6-mm1-test/fs/affs/file.c
--- linux-2.6.21-rc6-mm1/fs/affs/file.c 2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/affs/file.c2007-04-09 18:18:23.0 
-0700
@@ -628,11 +628,7 @@ static int affs_prepare_write_ofs(struct
return err;
}
if (to  PAGE_CACHE_SIZE) {
-   char *kaddr = kmap_atomic(page, KM_USER0);
-
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, to, PAGE_CACHE_SIZE - to);
if (size  offset + to) {
if (size  offset + PAGE_CACHE_SIZE)
tmp = size  ~PAGE_CACHE_MASK;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/13] ext4: use zero_user_page

2007-04-10 Thread Nate Diller

Use zero_user_page() instead of open-coding it. 

Signed-off-by: Nate Diller [EMAIL PROTECTED]

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext4/inode.c 
linux-2.6.21-rc6-mm1-test/fs/ext4/inode.c
--- linux-2.6.21-rc6-mm1/fs/ext4/inode.c2007-04-10 17:15:04.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext4/inode.c   2007-04-10 18:33:04.0 
-0700
@@ -1791,7 +1791,6 @@ int ext4_block_truncate_page(handle_t *h
struct inode *inode = mapping-host;
struct buffer_head *bh;
int err = 0;
-   void *kaddr;
 
if ((EXT4_I(inode)-i_flags  EXT4_EXTENTS_FL) 
test_opt(inode-i_sb, EXTENTS) 
@@ -1808,10 +1807,7 @@ int ext4_block_truncate_page(handle_t *h
 */
if (!page_has_buffers(page)  test_opt(inode-i_sb, NOBH) 
 ext4_should_writeback_data(inode)  PageUptodate(page)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
set_page_dirty(page);
goto unlock;
}
@@ -1864,11 +1860,7 @@ int ext4_block_truncate_page(handle_t *h
goto unlock;
}
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-
+   zero_user_page(page, offset, length);
BUFFER_TRACE(bh, zeroed end of block);
 
err = 0;
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext4/writeback.c 
linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c
--- linux-2.6.21-rc6-mm1/fs/ext4/writeback.c2007-04-10 18:05:52.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c   2007-04-10 
18:33:04.0 -0700
@@ -961,7 +961,6 @@ int ext4_wb_writepage(struct page *page,
loff_t i_size = i_size_read(inode);
pgoff_t end_index = i_size  PAGE_CACHE_SHIFT;
unsigned offset;
-   void *kaddr;
 
wb_debug(writepage %lu from inode %lu\n, page-index, inode-i_ino);
 
@@ -1011,10 +1010,7 @@ int ext4_wb_writepage(struct page *page,
 * the  page size, the remaining memory is zeroed when mapped, and
 * writes to that region are not written out to the file.
 */
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, PAGE_CACHE_SIZE - offset);
return ext4_wb_write_single_page(page, wbc);
 }
 
@@ -1065,7 +1061,6 @@ int ext4_wb_block_truncate_page(handle_t
struct inode *inode = mapping-host;
struct buffer_head bh, *bhw = bh;
unsigned blocksize, length;
-   void *kaddr;
int err = 0;
 
wb_debug(partial truncate from %lu on page %lu from inode %lu\n,
@@ -1104,10 +1099,7 @@ int ext4_wb_block_truncate_page(handle_t
}
}
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length);
SetPageUptodate(page);
__set_page_dirty_nobuffers(page);
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] fs: use simple_prepare_write to zero page data

2007-04-09 Thread Nate Diller

It's common for file systems to need to zero data on either side of a write,
if a page is not Uptodate during prepare_write.  It just so happens that
simple_prepare_write() in libfs.c does exactly that, so we can avoid
duplication and just call that function to zero page data.

Compile tested on x86_64.

signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

 cifs/file.c   |9 +
 ext4/writeback.c  |   17 +
 reiser4/plugin/item/extent_file_ops.c |   13 +++--
 3 files changed, 5 insertions(+), 34 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/cifs/file.c 
linux-2.6.21-rc6-mm1-test/fs/cifs/file.c
--- linux-2.6.21-rc6-mm1/fs/cifs/file.c 2007-04-09 18:25:37.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/cifs/file.c2007-04-09 18:23:16.0 
-0700
@@ -1955,14 +1955,7 @@ static int cifs_prepare_write(struct fil
 * We don't need to read data beyond the end of the file.
 * zero it, and set the page uptodate
 */
-   void *kaddr = kmap_atomic(page, KM_USER0);
-
-   if (from)
-   memset(kaddr, 0, from);
-   if (to < PAGE_CACHE_SIZE)
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   simple_prepare_write(file, page, from, to);
SetPageUptodate(page);
} else if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
/* might as well read a page, it is fast enough */
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext4/writeback.c 
linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c
--- linux-2.6.21-rc6-mm1/fs/ext4/writeback.c2007-04-09 18:32:52.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c   2007-04-09 
18:23:16.0 -0700
@@ -819,21 +819,6 @@ int ext4_wb_writepages(struct address_sp
return 0;
 }
 
-static void ext4_wb_clear_page(struct page *page, int from, int to)
-{
-   void *kaddr;
-
-   if (to < PAGE_CACHE_SIZE || from > 0) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   if (PAGE_CACHE_SIZE > to)
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   if (0 < from)
-   memset(kaddr, 0, from);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
-}
-
 int ext4_wb_prepare_write(struct file *file, struct page *page,
  unsigned from, unsigned to)
 {
@@ -863,7 +848,7 @@ int ext4_wb_prepare_write(struct file *f
/* this block isn't allocated yet, reserve space */
wb_debug("reserve space for new block\n");
page->private = 1;
-   ext4_wb_clear_page(page, from, to);
+   simple_prepare_write(file, page, from, to);
ClearPageMappedToDisk(page);
} else { 
/* block is already mapped, so no need to reserve */
diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c   
2007-04-09 18:32:52.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c  
2007-04-09 18:31:34.0 -0700
@@ -1040,16 +1040,9 @@ ssize_t reiser4_write_extent(struct file
BUG_ON(get_current_context()->trans->atom != NULL);
 
lock_page(page);
-   if (!PageUptodate(page) && to_page != PAGE_CACHE_SIZE) {
-   void *kaddr;
-
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr, 0, page_off);
-   memset(kaddr + page_off + to_page, 0,
-  PAGE_CACHE_SIZE - (page_off + to_page));
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
+   if (!PageUptodate(page) && to_page != PAGE_CACHE_SIZE)
+   simple_prepare_write(file, page, page_off,
+page_off + to_page);
 
written = filemap_copy_from_user(page, page_off, buf, to_page);
flush_dcache_page(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] fs: use memclear_highpage_flush to zero page data

2007-04-09 Thread Nate Diller

It's very common for file systems to need to zero part or all of a page, the
simplist way is just to use kmap_atomic() and memset().  There's actually a
library function in include/linux/highmem.h that does exactly that, but it's
confusingly named memclear_highpage_flush(), which is descriptive of *how*
it does the work rather than what the *purpose* is.  So this patch renames
the function to zero_page_data(), and calls it from the various places that
currently open code it.

Compile tested in x86_64.

signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

 drivers/block/loop.c |6 ---
 fs/affs/file.c   |6 ---
 fs/buffer.c  |   53 +--
 fs/direct-io.c   |8 +---
 fs/ecryptfs/mmap.c   |   14 +---
 fs/ext3/inode.c  |   12 +--
 fs/ext4/inode.c  |   12 +--
 fs/ext4/writeback.c  |   12 +--
 fs/gfs2/bmap.c   |6 ---
 fs/mpage.c   |   11 +-
 fs/nfs/read.c|   10 ++---
 fs/nfs/write.c   |2 -
 fs/ntfs/aops.c   |   32 +++---
 fs/ntfs/file.c   |   47 +--
 fs/ocfs2/aops.c  |5 --
 fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
 fs/reiser4/plugin/file/file.c|6 ---
 fs/reiser4/plugin/item/ctail.c   |6 ---
 fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
 fs/reiser4/plugin/item/tail.c|8 +---
 fs/reiserfs/file.c   |   39 ++
 fs/reiserfs/inode.c  |   13 +--
 fs/xfs/linux-2.6/xfs_lrw.c   |2 -
 include/linux/highmem.h  |2 -
 mm/filemap_xip.c |7 
 mm/truncate.c|2 -
 26 files changed, 78 insertions(+), 281 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/drivers/block/loop.c 
linux-2.6.21-rc6-mm1-test/drivers/block/loop.c
--- linux-2.6.21-rc6-mm1/drivers/block/loop.c   2007-04-09 17:24:00.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/drivers/block/loop.c  2007-04-09 
18:18:23.0 -0700
@@ -244,17 +244,13 @@ static int do_lo_send_aops(struct loop_d
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec->bv_page, bv_offs, size, IV);
if (unlikely(transfer_result)) {
-   char *kaddr;
-
/*
 * The transfer failed, but we still write the data to
 * keep prepare/commit calls balanced.
 */
printk(KERN_ERR "loop: transfer error block %llu\n",
   (unsigned long long)index);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, size);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_page_data(page, offset, size);
}
flush_dcache_page(page);
ret = aops->commit_write(file, page, offset,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/affs/file.c 
linux-2.6.21-rc6-mm1-test/fs/affs/file.c
--- linux-2.6.21-rc6-mm1/fs/affs/file.c 2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/affs/file.c2007-04-09 18:18:23.0 
-0700
@@ -628,11 +628,7 @@ static int affs_prepare_write_ofs(struct
return err;
}
if (to < PAGE_CACHE_SIZE) {
-   char *kaddr = kmap_atomic(page, KM_USER0);
-
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_page_data(page, to, PAGE_CACHE_SIZE - to);
if (size > offset + to) {
if (size < offset + PAGE_CACHE_SIZE)
tmp = size & ~PAGE_CACHE_MASK;
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/buffer.c 
linux-2.6.21-rc6-mm1-test/fs/buffer.c
--- linux-2.6.21-rc6-mm1/fs/buffer.c2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/buffer.c   2007-04-09 18:18:23.0 
-0700
@@ -1862,13 +1862,8 @@ static int __block_prepare_write(struct 
if (block_start >= to)
break;
if (buffer_new(bh)) {
-   void *kaddr;
-
clear_buffer_new(bh);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr+block_start, 0, bh->b_size);
-   flush_dcache_page(page);
-

Re: Reiser4. BEST FILESYSTEM EVER.

2007-04-09 Thread Nate Diller

On 08 Apr 2007 06:32:26 +0200, Christer Weinigel <[EMAIL PROTECTED]> wrote:

[EMAIL PROTECTED] writes:

> Lennart. Tell me again that these results from
>
> http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
> http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm
>
> are not of interest to you. I still don't understand why you have your
> head in the sand.

Oh, for fucks sake, stop sounding like a broken record.  You have
repeated the same totally meaningless statistics more times than I
care to count.  Please shut the fuck up.

wow, it's really amazing how reiser4 can still inspire flamewars so
easily when Hans isn't even around to antagonize people and escalate
things

As you discovered yourself (even though you seem to fail to understand
the significance of your discovery), bonnie writes files that consist
of mostly zeroes.  If your normal use cases consist of creating a
bunch of files containing zeroes, reiser4 with compression will do
great.  Just lovely.  Except that nobody sane would store a lot of
files containing zeroes, except an an excercize in mental
masturbation.  So the two bonnie benchmarks with lzo and gzip are
totally meaningless for any real life usages.

yeah, i sure wish Grev was still around running the benchmarks and
regression testing, cause I thought she came up with a good, QA
oriented mix of real benchmarks.  aside from a number of streaming
video benchmarks i did, those were the only results i actually trusted
to compare reiser4 with other systems.  I know Ted doesn't like the
Mongo suite, cause it focuses on small files and shows the common
weakness of block-aligned storage ... personally i thought it was
great for its primary purpose, making sure reiser4 was optimized for
its target workload.  i also recall that the distribution of small
files to large ones in mongo was pulled from some paper out of CMU,
but i can't find the reference to that study right now.

As for the amount of disk needed to store three kernel trees, the
figures you quote show that Reiser4 does tail combining where the tail
of multiple files are stored in one disk block.  A nice trick that
seems save you about 15% disk space compared to ext3.  Now you have to
realise what that means, it means that if the disk block containing
those tails (or any metadata pointing at that block) gets corrupted,
instead of just losing one disk block for one file, you will have lost
the tail for all the files sharing that disk block.  Depending on your
personal prioritites, saving 15% of the space may be worth the risk to
you, or maybe not.  Personally, for the only disk I'm short on space
on, I mostly store flac encoded images of my CD collection, and saving
2kByte out of every 300MByte disk simply doesn't make any difference,
and I much prefer a stable file system that I can trust not to lose my
data.  You might make different choices.

well, it turns out that reiser4 does things a little differently,
since tail packing has bad performance effects (i always turn it off
on my reiserfs partitions).  Reiser4 guarantees a file will be stored
contiguously if it is below a certain size (20K?), and instead stores
the whole file unaligned, so that many files can be packed together
without slack space.  this gives the best of both worlds
performance-wise, at the expense of some complicated flush code to
pack everything together in the tree before it gets written.  that
combined with the fine-grained locking scheme (per-node -- reiserfs
just has a global lock) is the primary reason the code is so
convoluted ... not poor coding.

The same goes for just about every feature that you tout, it has its
advantages, and it has its disadvantages.  Doing compression on data
is great if the data you store is compressible, and sucks if it isn't.
Doing compression on each disk block and then packing multiple
compressed blocks into each physical disk block will probably save
some space if the data is compressible, but at the same time it means
that you will spend a lot of CPU (and cache footprint) compressing and
uncompressing that data.  On a single user system where the CPU is
mostly idle it might not make much of a difference, on a heavily
loaded multiuser system it might do.

my understanding of the code is that it uses a heuristic to decide if
a file is already compressed, so that the system doesn't waste time on
them and simply writes them out directly.  there may also be a way to
turn it off for certain classes of files, this would be most useful
for executables and the like that are frequently mmap()ed and we care
more about page-alignment than read bandwidth or data density.
edward?

Logs can be compressed quite well using a block based compression
scheme, but the logs can be compressed even better by doing
compression on the whole file with gzip.  So what's the best choice,
to do transparent compression on the fly giving ok compression or
teaching the userspace tools to do compression of old logs and get
really

Re: Reiser4. BEST FILESYSTEM EVER.

2007-04-09 Thread Nate Diller


On 08 Apr 2007 06:32:26 +0200, Christer Weinigel [EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] writes:

 Lennart. Tell me again that these results from

 http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
 http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm

 are not of interest to you. I still don't understand why you have your
 head in the sand.

Oh, for fucks sake, stop sounding like a broken record.  You have
repeated the same totally meaningless statistics more times than I
care to count.  Please shut the fuck up.


wow, it's really amazing how reiser4 can still inspire flamewars so
easily when Hans isn't even around to antagonize people and escalate
things


As you discovered yourself (even though you seem to fail to understand
the significance of your discovery), bonnie writes files that consist
of mostly zeroes.  If your normal use cases consist of creating a
bunch of files containing zeroes, reiser4 with compression will do
great.  Just lovely.  Except that nobody sane would store a lot of
files containing zeroes, except an an excercize in mental
masturbation.  So the two bonnie benchmarks with lzo and gzip are
totally meaningless for any real life usages.


yeah, i sure wish Grev was still around running the benchmarks and
regression testing, cause I thought she came up with a good, QA
oriented mix of real benchmarks.  aside from a number of streaming
video benchmarks i did, those were the only results i actually trusted
to compare reiser4 with other systems.  I know Ted doesn't like the
Mongo suite, cause it focuses on small files and shows the common
weakness of block-aligned storage ... personally i thought it was
great for its primary purpose, making sure reiser4 was optimized for
its target workload.  i also recall that the distribution of small
files to large ones in mongo was pulled from some paper out of CMU,
but i can't find the reference to that study right now.


As for the amount of disk needed to store three kernel trees, the
figures you quote show that Reiser4 does tail combining where the tail
of multiple files are stored in one disk block.  A nice trick that
seems save you about 15% disk space compared to ext3.  Now you have to
realise what that means, it means that if the disk block containing
those tails (or any metadata pointing at that block) gets corrupted,
instead of just losing one disk block for one file, you will have lost
the tail for all the files sharing that disk block.  Depending on your
personal prioritites, saving 15% of the space may be worth the risk to
you, or maybe not.  Personally, for the only disk I'm short on space
on, I mostly store flac encoded images of my CD collection, and saving
2kByte out of every 300MByte disk simply doesn't make any difference,
and I much prefer a stable file system that I can trust not to lose my
data.  You might make different choices.


well, it turns out that reiser4 does things a little differently,
since tail packing has bad performance effects (i always turn it off
on my reiserfs partitions).  Reiser4 guarantees a file will be stored
contiguously if it is below a certain size (20K?), and instead stores
the whole file unaligned, so that many files can be packed together
without slack space.  this gives the best of both worlds
performance-wise, at the expense of some complicated flush code to
pack everything together in the tree before it gets written.  that
combined with the fine-grained locking scheme (per-node -- reiserfs
just has a global lock) is the primary reason the code is so
convoluted ... not poor coding.


The same goes for just about every feature that you tout, it has its
advantages, and it has its disadvantages.  Doing compression on data
is great if the data you store is compressible, and sucks if it isn't.
Doing compression on each disk block and then packing multiple
compressed blocks into each physical disk block will probably save
some space if the data is compressible, but at the same time it means
that you will spend a lot of CPU (and cache footprint) compressing and
uncompressing that data.  On a single user system where the CPU is
mostly idle it might not make much of a difference, on a heavily
loaded multiuser system it might do.


my understanding of the code is that it uses a heuristic to decide if
a file is already compressed, so that the system doesn't waste time on
them and simply writes them out directly.  there may also be a way to
turn it off for certain classes of files, this would be most useful
for executables and the like that are frequently mmap()ed and we care
more about page-alignment than read bandwidth or data density.
edward?


Logs can be compressed quite well using a block based compression
scheme, but the logs can be compressed even better by doing
compression on the whole file with gzip.  So what's the best choice,
to do transparent compression on the fly giving ok compression or
teaching the userspace tools to do compression of old logs and get
really good

[PATCH 1/2] fs: use memclear_highpage_flush to zero page data

2007-04-09 Thread Nate Diller

It's very common for file systems to need to zero part or all of a page, the
simplist way is just to use kmap_atomic() and memset().  There's actually a
library function in include/linux/highmem.h that does exactly that, but it's
confusingly named memclear_highpage_flush(), which is descriptive of *how*
it does the work rather than what the *purpose* is.  So this patch renames
the function to zero_page_data(), and calls it from the various places that
currently open code it.

Compile tested in x86_64.

signed-off-by: Nate Diller [EMAIL PROTECTED]

---

 drivers/block/loop.c |6 ---
 fs/affs/file.c   |6 ---
 fs/buffer.c  |   53 +--
 fs/direct-io.c   |8 +---
 fs/ecryptfs/mmap.c   |   14 +---
 fs/ext3/inode.c  |   12 +--
 fs/ext4/inode.c  |   12 +--
 fs/ext4/writeback.c  |   12 +--
 fs/gfs2/bmap.c   |6 ---
 fs/mpage.c   |   11 +-
 fs/nfs/read.c|   10 ++---
 fs/nfs/write.c   |2 -
 fs/ntfs/aops.c   |   32 +++---
 fs/ntfs/file.c   |   47 +--
 fs/ocfs2/aops.c  |5 --
 fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
 fs/reiser4/plugin/file/file.c|6 ---
 fs/reiser4/plugin/item/ctail.c   |6 ---
 fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
 fs/reiser4/plugin/item/tail.c|8 +---
 fs/reiserfs/file.c   |   39 ++
 fs/reiserfs/inode.c  |   13 +--
 fs/xfs/linux-2.6/xfs_lrw.c   |2 -
 include/linux/highmem.h  |2 -
 mm/filemap_xip.c |7 
 mm/truncate.c|2 -
 26 files changed, 78 insertions(+), 281 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/drivers/block/loop.c 
linux-2.6.21-rc6-mm1-test/drivers/block/loop.c
--- linux-2.6.21-rc6-mm1/drivers/block/loop.c   2007-04-09 17:24:00.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/drivers/block/loop.c  2007-04-09 
18:18:23.0 -0700
@@ -244,17 +244,13 @@ static int do_lo_send_aops(struct loop_d
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec-bv_page, bv_offs, size, IV);
if (unlikely(transfer_result)) {
-   char *kaddr;
-
/*
 * The transfer failed, but we still write the data to
 * keep prepare/commit calls balanced.
 */
printk(KERN_ERR loop: transfer error block %llu\n,
   (unsigned long long)index);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, size);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_page_data(page, offset, size);
}
flush_dcache_page(page);
ret = aops-commit_write(file, page, offset,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/affs/file.c 
linux-2.6.21-rc6-mm1-test/fs/affs/file.c
--- linux-2.6.21-rc6-mm1/fs/affs/file.c 2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/affs/file.c2007-04-09 18:18:23.0 
-0700
@@ -628,11 +628,7 @@ static int affs_prepare_write_ofs(struct
return err;
}
if (to  PAGE_CACHE_SIZE) {
-   char *kaddr = kmap_atomic(page, KM_USER0);
-
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_page_data(page, to, PAGE_CACHE_SIZE - to);
if (size  offset + to) {
if (size  offset + PAGE_CACHE_SIZE)
tmp = size  ~PAGE_CACHE_MASK;
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/buffer.c 
linux-2.6.21-rc6-mm1-test/fs/buffer.c
--- linux-2.6.21-rc6-mm1/fs/buffer.c2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/buffer.c   2007-04-09 18:18:23.0 
-0700
@@ -1862,13 +1862,8 @@ static int __block_prepare_write(struct 
if (block_start = to)
break;
if (buffer_new(bh)) {
-   void *kaddr;
-
clear_buffer_new(bh);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr+block_start, 0, bh-b_size);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0

[PATCH 2/2] fs: use simple_prepare_write to zero page data

2007-04-09 Thread Nate Diller

It's common for file systems to need to zero data on either side of a write,
if a page is not Uptodate during prepare_write.  It just so happens that
simple_prepare_write() in libfs.c does exactly that, so we can avoid
duplication and just call that function to zero page data.

Compile tested on x86_64.

signed-off-by: Nate Diller [EMAIL PROTECTED]

---

 cifs/file.c   |9 +
 ext4/writeback.c  |   17 +
 reiser4/plugin/item/extent_file_ops.c |   13 +++--
 3 files changed, 5 insertions(+), 34 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/cifs/file.c 
linux-2.6.21-rc6-mm1-test/fs/cifs/file.c
--- linux-2.6.21-rc6-mm1/fs/cifs/file.c 2007-04-09 18:25:37.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/cifs/file.c2007-04-09 18:23:16.0 
-0700
@@ -1955,14 +1955,7 @@ static int cifs_prepare_write(struct fil
 * We don't need to read data beyond the end of the file.
 * zero it, and set the page uptodate
 */
-   void *kaddr = kmap_atomic(page, KM_USER0);
-
-   if (from)
-   memset(kaddr, 0, from);
-   if (to  PAGE_CACHE_SIZE)
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   simple_prepare_write(file, page, from, to);
SetPageUptodate(page);
} else if ((file-f_flags  O_ACCMODE) != O_WRONLY) {
/* might as well read a page, it is fast enough */
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext4/writeback.c 
linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c
--- linux-2.6.21-rc6-mm1/fs/ext4/writeback.c2007-04-09 18:32:52.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c   2007-04-09 
18:23:16.0 -0700
@@ -819,21 +819,6 @@ int ext4_wb_writepages(struct address_sp
return 0;
 }
 
-static void ext4_wb_clear_page(struct page *page, int from, int to)
-{
-   void *kaddr;
-
-   if (to  PAGE_CACHE_SIZE || from  0) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   if (PAGE_CACHE_SIZE  to)
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   if (0  from)
-   memset(kaddr, 0, from);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
-}
-
 int ext4_wb_prepare_write(struct file *file, struct page *page,
  unsigned from, unsigned to)
 {
@@ -863,7 +848,7 @@ int ext4_wb_prepare_write(struct file *f
/* this block isn't allocated yet, reserve space */
wb_debug(reserve space for new block\n);
page-private = 1;
-   ext4_wb_clear_page(page, from, to);
+   simple_prepare_write(file, page, from, to);
ClearPageMappedToDisk(page);
} else { 
/* block is already mapped, so no need to reserve */
diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c   
2007-04-09 18:32:52.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c  
2007-04-09 18:31:34.0 -0700
@@ -1040,16 +1040,9 @@ ssize_t reiser4_write_extent(struct file
BUG_ON(get_current_context()-trans-atom != NULL);
 
lock_page(page);
-   if (!PageUptodate(page)  to_page != PAGE_CACHE_SIZE) {
-   void *kaddr;
-
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr, 0, page_off);
-   memset(kaddr + page_off + to_page, 0,
-  PAGE_CACHE_SIZE - (page_off + to_page));
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
+   if (!PageUptodate(page)  to_page != PAGE_CACHE_SIZE)
+   simple_prepare_write(file, page, page_off,
+page_off + to_page);
 
written = filemap_copy_from_user(page, page_off, buf, to_page);
flush_dcache_page(page);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] reiser4: page lock recursion in reiser4_write_extent

2007-03-13 Thread Nate Diller


This little code snippet seems to have a page_lock recursion, in
addition to overall looking particularly fragile to me.  It seems to
be handling the case where a page needs to be brought uptodate because
a partial page write is being done.  The page gets locked as many as 3
times, each checking PageUptodate, however the two failure cases here
go BUG() instead of returning an error.  I'm starting to think that
somehow the whole suspect branch just never gets taken, because
otherwise I would expect to see bug reports related to -EIO, -ENOMEM,
etc causing this to barf.

either way, it seems there's a lock recursion if another thread races
to bring @page uptodate while we're waiting on the first lock_page()
call.

---

   page = jnode_page(jnodes[i]);
   if (page_offset(page) < inode->i_size &&
   !PageUptodate(page) && to_page != PAGE_CACHE_SIZE) {
   /*
* the above is not optimal for partial write to last
* page of file when file size is not at boundary of
* page
*/
takes the lock
   lock_page(page);
raced with readpage?
   if (!PageUptodate(page)) {
readpage drops lock
   result = readpage_unix_file(NULL, page);
   BUG_ON(result != 0);
-ENOMEM?
   /* wait for read completion */
   lock_page(page);
   BUG_ON(!PageUptodate(page));
-EIO?
   unlock_page(page);
   } else
still have the lock here
   result = 0;
   }

   BUG_ON(get_current_context()->trans->atom != NULL);
   fault_in_pages_readable(buf, to_page);
   BUG_ON(get_current_context()->trans->atom != NULL);

BOOM!!!
   lock_page(page);
   if (!PageUptodate(page) && to_page != PAGE_CACHE_SIZE) {

---

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] reiser4: page lock recursion in reiser4_write_extent

2007-03-13 Thread Nate Diller


This little code snippet seems to have a page_lock recursion, in
addition to overall looking particularly fragile to me.  It seems to
be handling the case where a page needs to be brought uptodate because
a partial page write is being done.  The page gets locked as many as 3
times, each checking PageUptodate, however the two failure cases here
go BUG() instead of returning an error.  I'm starting to think that
somehow the whole suspect branch just never gets taken, because
otherwise I would expect to see bug reports related to -EIO, -ENOMEM,
etc causing this to barf.

either way, it seems there's a lock recursion if another thread races
to bring @page uptodate while we're waiting on the first lock_page()
call.

---

   page = jnode_page(jnodes[i]);
   if (page_offset(page)  inode-i_size 
   !PageUptodate(page)  to_page != PAGE_CACHE_SIZE) {
   /*
* the above is not optimal for partial write to last
* page of file when file size is not at boundary of
* page
*/
takes the lock
   lock_page(page);
raced with readpage?
   if (!PageUptodate(page)) {
readpage drops lock
   result = readpage_unix_file(NULL, page);
   BUG_ON(result != 0);
-ENOMEM?
   /* wait for read completion */
   lock_page(page);
   BUG_ON(!PageUptodate(page));
-EIO?
   unlock_page(page);
   } else
still have the lock here
   result = 0;
   }

   BUG_ON(get_current_context()-trans-atom != NULL);
   fault_in_pages_readable(buf, to_page);
   BUG_ON(get_current_context()-trans-atom != NULL);

BOOM!!!
   lock_page(page);
   if (!PageUptodate(page)  to_page != PAGE_CACHE_SIZE) {

---

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] fs: add an iovec iterator

2007-02-08 Thread Nate Diller

On 2/8/07, Nick Piggin <[EMAIL PROTECTED]> wrote:

On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote:
> On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote:
> > Add an iterator data structure to operate over an iovec. Add usercopy
> > operators needed by generic_file_buffered_write, and convert that function
> > over.
>
> iovec_iterator is an awfully long and not very descriptive name.
> In past discussions we named this thingy iodesc and wanted to pass it
> down all the I/O path, including the file operations.

Hi Christoph,

Sure I think it would be a good idea to shorten the name. And yes, although
I just construct the iterator to pass into perform_write, I think it should
make sense to go much further up the call stack instead of passing all those
args around. iodesc seems like a fine name, so I'll use that unless
anyone objects.

i had a patch integrating the iodesc idea, but after some thought, had
decided to call it struct file_io.  That name reflects the fact that
it's doing I/O in arbitrary lengths with byte offsets, and struct
file_io *fio contrasts well with struct bio (block_io).  I also had
used the field ->nbytes instead of ->count, to clarify the difference
between segment iterators, segment offsets, and absolute bytecount.

FYI, the patch is still in the works and would convert the whole file
I/O stack to use the new structure.  I would like to base it off of
this work as well if this makes it into -mm (as I think it should)

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] fs: add an iovec iterator

2007-02-08 Thread Nate Diller


On 2/8/07, Nick Piggin [EMAIL PROTECTED] wrote:

On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote:
 On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote:
  Add an iterator data structure to operate over an iovec. Add usercopy
  operators needed by generic_file_buffered_write, and convert that function
  over.

 iovec_iterator is an awfully long and not very descriptive name.
 In past discussions we named this thingy iodesc and wanted to pass it
 down all the I/O path, including the file operations.

Hi Christoph,

Sure I think it would be a good idea to shorten the name. And yes, although
I just construct the iterator to pass into perform_write, I think it should
make sense to go much further up the call stack instead of passing all those
args around. iodesc seems like a fine name, so I'll use that unless
anyone objects.


i had a patch integrating the iodesc idea, but after some thought, had
decided to call it struct file_io.  That name reflects the fact that
it's doing I/O in arbitrary lengths with byte offsets, and struct
file_io *fio contrasts well with struct bio (block_io).  I also had
used the field -nbytes instead of -count, to clarify the difference
between segment iterators, segment offsets, and absolute bytecount.

FYI, the patch is still in the works and would convert the whole file
I/O stack to use the new structure.  I would like to base it off of
this work as well if this makes it into -mm (as I think it should)

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Tracking mlocked pages and moving them off the LRU

2007-02-07 Thread Nate Diller

On 2/7/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:

On Tue, 6 Feb 2007, Nate Diller wrote:

> > The dirty ratio with the ZVCS would be
> >
> > NR_DIRTY + NR_UNSTABLE_NFS
> > /
> > NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE + NR_MLOCK
>
> I don't understand why you want to account mlocked pages in
> dirty_ratio.  of course mlocked pages *can* be dirty, but they have no
> relevance in the write throttling code.  the point of dirty ratio is

mlocked pages can be counted as dirty pages. So if we do not include
NR_MLOCK in the number of total pages that could be dirty then we may in
some cases have >100% dirty pages.

unless we exclude mlocked dirty pages from NR_DIRTY accounting, which
is what i suggest should be done as part of this patch

> to guarantee that there are some number of non-dirty, non-pinned,
> non-mlocked pages on the LRU, to (try to) avoid deadlocks where the
> writeback path needs to allocate pages, which many filesystems like to
> do.  if an mlocked page is clean, there's still no way to free it up,
> so it should not be treated as being on the LRU at all, for write
> throttling.  the ideal (IMO) dirty ratio would be

Hmmm... I think write throttling is different from reclaim. In write
throttling the major objective is to decouple the applications from
the physical I/O. So the dirty ratio specifies how much "buffer" space
can be used for I/O. There is an issue that too many dirty pages will
cause difficulty for reclaim because pages can only be reclaimed after
writeback is complete.

NR_DIRTY is only used for write throttling, right?  well, and
reporting to user-space, but again, i suggest that user space should
get to see NR_MLOCKED as well.  would people flip out if NR_DIRTY
stopped showing pages that are mlocked, as long as a seperate
NR_MLOCKED variable was present?

And yes this is not true for mlocked pages.

>
> NR_DIRTY - NR_DIRTY_MLOCKED + NR_UNSTABLE_NFS
>/
> NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE
>
> obviously it's kinda useless to keep an NR_DIRTY_MLOCKED counter, any
> of these mlock accounting schemes could easily be modified to update
> the NR_DIRTY counter so that it only reflects dirty unpinned pages,
> and not mlocked ones.

So you would be okay with dirty_ratio possibly be >100% of mlocked pages
are dirty?

> is that the only place you wanted to have an accurate mocked page count?

Rik had some other ideas on what to do with it. I also think we may end up
checking for excessive high mlock counts in various tight VM situations.

i'd be wary of a VM algorithm that treated mlocked pages any
differently than, say, unreclaimable slab pages.  but there are no
concrete suggestions yet, so i won't comment further.

all this is not to say that i dislike the idea of keeping mlocked
pages off the LRU, quite the opposite i've been looking for this for a
while and was hoping that Stone Wang's wired list patch
(http://lkml.org/lkml/2006/3/20/128) would get further than it did.
but i don't see the need to keep strict accounting if it hurts
performance in the common case.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Tracking mlocked pages and moving them off the LRU

2007-02-07 Thread Nate Diller


On 2/7/07, Christoph Lameter [EMAIL PROTECTED] wrote:

On Tue, 6 Feb 2007, Nate Diller wrote:

  The dirty ratio with the ZVCS would be
 
  NR_DIRTY + NR_UNSTABLE_NFS
  /
  NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE + NR_MLOCK

 I don't understand why you want to account mlocked pages in
 dirty_ratio.  of course mlocked pages *can* be dirty, but they have no
 relevance in the write throttling code.  the point of dirty ratio is

mlocked pages can be counted as dirty pages. So if we do not include
NR_MLOCK in the number of total pages that could be dirty then we may in
some cases have 100% dirty pages.


unless we exclude mlocked dirty pages from NR_DIRTY accounting, which
is what i suggest should be done as part of this patch


 to guarantee that there are some number of non-dirty, non-pinned,
 non-mlocked pages on the LRU, to (try to) avoid deadlocks where the
 writeback path needs to allocate pages, which many filesystems like to
 do.  if an mlocked page is clean, there's still no way to free it up,
 so it should not be treated as being on the LRU at all, for write
 throttling.  the ideal (IMO) dirty ratio would be

Hmmm... I think write throttling is different from reclaim. In write
throttling the major objective is to decouple the applications from
the physical I/O. So the dirty ratio specifies how much buffer space
can be used for I/O. There is an issue that too many dirty pages will
cause difficulty for reclaim because pages can only be reclaimed after
writeback is complete.


NR_DIRTY is only used for write throttling, right?  well, and
reporting to user-space, but again, i suggest that user space should
get to see NR_MLOCKED as well.  would people flip out if NR_DIRTY
stopped showing pages that are mlocked, as long as a seperate
NR_MLOCKED variable was present?


And yes this is not true for mlocked pages.


 NR_DIRTY - NR_DIRTY_MLOCKED + NR_UNSTABLE_NFS
/
 NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE

 obviously it's kinda useless to keep an NR_DIRTY_MLOCKED counter, any
 of these mlock accounting schemes could easily be modified to update
 the NR_DIRTY counter so that it only reflects dirty unpinned pages,
 and not mlocked ones.

So you would be okay with dirty_ratio possibly be 100% of mlocked pages
are dirty?

 is that the only place you wanted to have an accurate mocked page count?

Rik had some other ideas on what to do with it. I also think we may end up
checking for excessive high mlock counts in various tight VM situations.


i'd be wary of a VM algorithm that treated mlocked pages any
differently than, say, unreclaimable slab pages.  but there are no
concrete suggestions yet, so i won't comment further.

all this is not to say that i dislike the idea of keeping mlocked
pages off the LRU, quite the opposite i've been looking for this for a
while and was hoping that Stone Wang's wired list patch
(http://lkml.org/lkml/2006/3/20/128) would get further than it did.
but i don't see the need to keep strict accounting if it hurts
performance in the common case.

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Tracking mlocked pages and moving them off the LRU

2007-02-06 Thread Nate Diller

On 2/4/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:

On Sun, 4 Feb 2007, Arjan van de Ven wrote:

>
> > Exclusion or inclusion of NR_MLOCK number is straightforward for the dirty
> > ratio calcuations. global_page_state(NR_MLOCK) f.e. would get us totals on
> > mlocked pages per zone. node_page_state(NR_MLOCK) gives a node specific
> > number of mlocked pages. The nice thing about ZVCs is that it allows
> > easy access to counts on different levels.
>
> however... mlocked pages still can be dirty, and want to be written back
> at some point ;)

Yes that is why we need to add them to the count of total pages.

> I can see the point of doing dirty ratio as percentage of the LRU size,
> but in that case you don't need to track NR_MLOCK, only the total LRU
> size. (And yes it'll be sometimes optimistic because not all mlock'd
> pages are moved off the lru yet, but I doubt you'll have that as a
> problem in practice)

The dirty ratio with the ZVCS would be

NR_DIRTY + NR_UNSTABLE_NFS
/
NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE + NR_MLOCK

I don't understand why you want to account mlocked pages in
dirty_ratio.  of course mlocked pages *can* be dirty, but they have no
relevance in the write throttling code.  the point of dirty ratio is
to guarantee that there are some number of non-dirty, non-pinned,
non-mlocked pages on the LRU, to (try to) avoid deadlocks where the
writeback path needs to allocate pages, which many filesystems like to
do.  if an mlocked page is clean, there's still no way to free it up,
so it should not be treated as being on the LRU at all, for write
throttling.  the ideal (IMO) dirty ratio would be

NR_DIRTY - NR_DIRTY_MLOCKED + NR_UNSTABLE_NFS
   /
NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE

obviously it's kinda useless to keep an NR_DIRTY_MLOCKED counter, any
of these mlock accounting schemes could easily be modified to update
the NR_DIRTY counter so that it only reflects dirty unpinned pages,
and not mlocked ones.

is that the only place you wanted to have an accurate mocked page count?

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Tracking mlocked pages and moving them off the LRU

2007-02-06 Thread Nate Diller


On 2/4/07, Christoph Lameter [EMAIL PROTECTED] wrote:

On Sun, 4 Feb 2007, Arjan van de Ven wrote:


  Exclusion or inclusion of NR_MLOCK number is straightforward for the dirty
  ratio calcuations. global_page_state(NR_MLOCK) f.e. would get us totals on
  mlocked pages per zone. node_page_state(NR_MLOCK) gives a node specific
  number of mlocked pages. The nice thing about ZVCs is that it allows
  easy access to counts on different levels.

 however... mlocked pages still can be dirty, and want to be written back
 at some point ;)

Yes that is why we need to add them to the count of total pages.

 I can see the point of doing dirty ratio as percentage of the LRU size,
 but in that case you don't need to track NR_MLOCK, only the total LRU
 size. (And yes it'll be sometimes optimistic because not all mlock'd
 pages are moved off the lru yet, but I doubt you'll have that as a
 problem in practice)

The dirty ratio with the ZVCS would be

NR_DIRTY + NR_UNSTABLE_NFS
/
NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE + NR_MLOCK


I don't understand why you want to account mlocked pages in
dirty_ratio.  of course mlocked pages *can* be dirty, but they have no
relevance in the write throttling code.  the point of dirty ratio is
to guarantee that there are some number of non-dirty, non-pinned,
non-mlocked pages on the LRU, to (try to) avoid deadlocks where the
writeback path needs to allocate pages, which many filesystems like to
do.  if an mlocked page is clean, there's still no way to free it up,
so it should not be treated as being on the LRU at all, for write
throttling.  the ideal (IMO) dirty ratio would be

NR_DIRTY - NR_DIRTY_MLOCKED + NR_UNSTABLE_NFS
   /
NR_FREE_PAGES + NR_INACTIVE + NR_ACTIVE

obviously it's kinda useless to keep an NR_DIRTY_MLOCKED counter, any
of these mlock accounting schemes could easily be modified to update
the NR_DIRTY counter so that it only reflects dirty unpinned pages,
and not mlocked ones.

is that the only place you wanted to have an accurate mocked page count?

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-17 Thread Nate Diller


On Wed, 17 Jan 2007, Benjamin LaHaise wrote:


On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:

the right thing to do from a design perspective.  Hopefully it enables
a new architecture that can reduce context switches in I/O completion,
and reduce overhead.  That's the real motive ;)


And it's a broken motive.  Context switches per se are not bad, as they
make it possible to properly schedule code in a busy system (which is
*very* important when realtime concerns come into play).  Have a look
at how things were done in the 2.4 aio code to see how completion would
get done with a non-retry method, typically in interrupt context.  I had
code that did direct I/O rather differently by sharing code with the
read/write code paths at some point, the catch being that it was pretty
invasive, which meant that it never got merged with the changes to handle
writeback pressure and other work that happened during 2.5.


I'm having some trouble understanding your concern.  From my perspective,
any unnecessary context switch represents not only performance loss, but
extra complexity in the code.  In this case, I'm not suggesting that the
aio.c code causes problems, quite the opposite.  The code I'd like to change
is FS and md levels, where context switches happen because of timers,
workqueues, and worker threads.  For sync I/O, these layers could be doing
their completion work in process context, but because waiting on sync I/O is
done in layers above, they must resort to other means, even for the common
case.  The dm-crypt module is the most straightforward example.

I took a look at some 2.4.18 aio patches in kernel.org/.../bcrl/aio/, and if
I understand what you did, you were basically operating at the aops level
rather than f_ops.  I actually like that idea, it's nicer than having the
direct-io code do its work seperately from the aio code.  Part of where I'm
going with this patch is a better integration between the block layer
(make_request), page layer (aops), and FS layer (f_ops), particularly in the
completion paths.  The direct-io code is an improvement over the common code
on that point, do_readahead() and friends all wait on individual pages to
become uptodate.  I'd like to bring some improvements from the directIO
architecture into use in the common case, which I hope will help
performance.

I know that might seem somewhat unrelated, but I don't think it is.  This
change goes hand in hand with using completion handlers in the aops.  That
will link together the completion callback in the bio with the aio callback,
so that the whole stack can finish its work in one context.


That said, you can't make kiocb private without completely removing the
ability of the rest of the kernel to complete an aio sanely from irq context.
You need some form of i/o descriptor, and a kiocb is just that.  Adding more
layering is just going to make things messier and slower for no real gain.


This patchset does not change how or when I/O completion happens,
aio_complete() will still get called from direct-io.c, nfs-direct.c, et al. 
The iocb structure is still passed to aio_complete, just like before.  The

only difference is that the lower level code doesn't know that it's got an
iocb, all it sees is an opaque cookie.  It's more like enforcing a layer
that's already in place, and I think things got simpler rather than messier. 
Whether things are slower or not remains to be seen, but I expect no

measurable changes either way with this patch.

I'm releasing a new version of the patch soon, it will use a new iodesc
structure to keep track of iovec state, which simplifies things further.  It
also will have a new version of the usb gadget code, and some general
cleanups.  I hope you'll take a look at it.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-17 Thread Nate Diller


On Wed, 17 Jan 2007, Benjamin LaHaise wrote:


On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:

the right thing to do from a design perspective.  Hopefully it enables
a new architecture that can reduce context switches in I/O completion,
and reduce overhead.  That's the real motive ;)


And it's a broken motive.  Context switches per se are not bad, as they
make it possible to properly schedule code in a busy system (which is
*very* important when realtime concerns come into play).  Have a look
at how things were done in the 2.4 aio code to see how completion would
get done with a non-retry method, typically in interrupt context.  I had
code that did direct I/O rather differently by sharing code with the
read/write code paths at some point, the catch being that it was pretty
invasive, which meant that it never got merged with the changes to handle
writeback pressure and other work that happened during 2.5.


I'm having some trouble understanding your concern.  From my perspective,
any unnecessary context switch represents not only performance loss, but
extra complexity in the code.  In this case, I'm not suggesting that the
aio.c code causes problems, quite the opposite.  The code I'd like to change
is FS and md levels, where context switches happen because of timers,
workqueues, and worker threads.  For sync I/O, these layers could be doing
their completion work in process context, but because waiting on sync I/O is
done in layers above, they must resort to other means, even for the common
case.  The dm-crypt module is the most straightforward example.

I took a look at some 2.4.18 aio patches in kernel.org/.../bcrl/aio/, and if
I understand what you did, you were basically operating at the aops level
rather than f_ops.  I actually like that idea, it's nicer than having the
direct-io code do its work seperately from the aio code.  Part of where I'm
going with this patch is a better integration between the block layer
(make_request), page layer (aops), and FS layer (f_ops), particularly in the
completion paths.  The direct-io code is an improvement over the common code
on that point, do_readahead() and friends all wait on individual pages to
become uptodate.  I'd like to bring some improvements from the directIO
architecture into use in the common case, which I hope will help
performance.

I know that might seem somewhat unrelated, but I don't think it is.  This
change goes hand in hand with using completion handlers in the aops.  That
will link together the completion callback in the bio with the aio callback,
so that the whole stack can finish its work in one context.


That said, you can't make kiocb private without completely removing the
ability of the rest of the kernel to complete an aio sanely from irq context.
You need some form of i/o descriptor, and a kiocb is just that.  Adding more
layering is just going to make things messier and slower for no real gain.


This patchset does not change how or when I/O completion happens,
aio_complete() will still get called from direct-io.c, nfs-direct.c, et al. 
The iocb structure is still passed to aio_complete, just like before.  The

only difference is that the lower level code doesn't know that it's got an
iocb, all it sees is an opaque cookie.  It's more like enforcing a layer
that's already in place, and I think things got simpler rather than messier. 
Whether things are slower or not remains to be seen, but I expect no

measurable changes either way with this patch.

I'm releasing a new version of the patch soon, it will use a new iodesc
structure to keep track of iovec state, which simplifies things further.  It
also will have a new version of the usb gadget code, and some general
cleanups.  I hope you'll take a look at it.

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 4/10][RFC] aio: convert aio_complete to file_endio_t

2007-01-16 Thread Nate Diller

On 1/15/07, David Brownell <[EMAIL PROTECTED]> wrote:

On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
> --- a/drivers/usb/gadget/inode.c  2007-01-12 14:42:29.0 -0800
> +++ b/drivers/usb/gadget/inode.c  2007-01-12 14:25:34.0 -0800
> @@ -559,35 +559,32 @@ static int ep_aio_cancel(struct kiocb *i
>   return value;
>  }
>
> -static ssize_t ep_aio_read_retry(struct kiocb *iocb)
> +static int ep_aio_read_retry(struct kiocb *iocb)
>  {
>   struct kiocb_priv   *priv = iocb->private;
> - ssize_t len, total;
> - int i;
> + ssize_t total;
> + int i, err = 0;
>
>   /* we "retry" to get the right mm context for this: */
>
>   /* copy stuff into user buffers */
>   total = priv->actual;
> - len = 0;
>   for (i=0; i < priv->nr_segs; i++) {
>   ssize_t this = min((ssize_t)(priv->iv[i].iov_len), total);
>
>   if (copy_to_user(priv->iv[i].iov_base, priv->buf, this)) {
> - if (len == 0)
> - len = -EFAULT;
> + err = -EFAULT;

Discarding the capability to report partial success, e.g. that the first N
bytes were properly transferred?  I don't see any virtue in that change.
Quite the opposite in fact.

I think you're also expecting that if N bytes were requested, that's always
how many will be received.  That's not true for packetized I/O such as USB
isochronous transfers ... where it's quite legit (and in some cases routine)
for the other end to send packets that are shorter than the maximum allowed.
Sending a zero length packet is not the same as sending no packet at all,
for another example.

I will convert this (usb) code to use the standard completion path,
which you will notice *gained* the ability to properly report both an
error and a partial success as part of this patch.  In fact, fixing
this up was my intention when I wrote this patch, and the later patch
was a compromise intended to get this whole bundle out for review in a
timely manner :)

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops

2007-01-16 Thread Nate Diller

On 1/15/07, David Brownell <[EMAIL PROTECTED]> wrote:

On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
> This removes the aio implementation from the usb gadget file system.

NAK.  I see a deep mis-understanding here.

> Aside
> from making very creative (!) use of the aio retry path, it can't be of any
> use performance-wise

Other than the basic win of letting one userspace thread keep an I/O
stream active while at the same time processing the data it reads or
writes??  That's the "async" part of AIO.

There's a not-so-little thing called "I/O overlap" ... which is the only
way to prevent wasting bandwidth between (non-cacheable) I/O requests,
and thus is the only way to let userspace code achieve anything close
to the maximum I/O bandwidth the hardware can achieve.

We want to see the host side "usbfs" evolve to support AIO like this
too, for the same reasons.  (Currently it has fairly ugly AIO code
that looks unlike any other AIO code in Linux.  Recent updates to
support a file-per-endpoint device model are a necessary precursor
to switching over to standard AIO syscalls.)

> because it always kmalloc()s a bounce buffer for the
> *whole* I/O size.

By and large that's a negligible factor compared to being able to
achieve I/O overlap.  ISTR the reason for not doing fancy DMA magic
was that the cost of this style AIO was under 1 KByte object code
on ARM, which was easy to justify ... while DMA magic to do that
sort of stuff would be much fatter, as well as more error prone.

(And that's why the "creative" use of the retry path.  As I've
observed before, "retry" is a misnomer in the general sense of
an async I/O framework.  It's more of a semi-completion callback;
I/O can't in general be "retried" on error or fault, and even in
the current usage it's not really a "retry".)

Now that high speed peripheral hardware is becoming more common on
embedded Linuxes -- TI has DaVinci, OMAP 2430, TUSB6010 (as found
in the new Nokia 800 tablets); Atmel AVR32 AP7000; at least a couple
parts that should be able to use the same musb_hdrc driver as those
TI parts; and a few other chips I've heard of -- there may be some
virtue in eliminating the memcpy, since those CPUs don't have many
MIPS to waste.  (Iff the memcpy turns out to be a real issue...)

> Perhaps the only reason to keep it around is the ability
> to cancel I/O requests, which only applies when using the user space async
> I/O interface.

It's good to have almost the complete kernel API functionality
exposed to userspace, and having I/O cancelation is an inevitable
consequence of a complete AIO framework ... but that particular
issue was not a driving concern.

The reason for AIO is to have a *STANDARD* userspace interface
for *ASYNC I/O* which otherwise can't exist.  You know, the kind
of I/O interface that can't be implemented with read() and write()
syscalls, which for non-buffered I/O necessarily preclude all I/O
overlap.  AIO itself is a direct match to most I/O frameworks'
primitives.  (AIOCB being directly analagous to peripheral side
"struct usb_request" and host side "struct urb".)

You know, I've always thought that one reason the AIO discussions
seemed strange is that they weren't really focussed on I/O (the
lowlevel after-the-caches stuff) so much as filesystems (several
layers up in the stack, with intervening caching frameworks).

The first several implementations of AIO that I saw were restricted
to "real" I/O and not applicable to disk backed files.  So while I
was glad the Linux approach didn't make that mistake, it's seemed
that it might be wanting to make a converse mistake: neglecting I/O
that isn't aimed at data stored on disks.

> I highly doubt that is enough incentive to justify the extra
> complexity here or in user-space, so I think it's a safe bet to remove this.
> If that feature still desired, it would be possible to implement a sync
> interface that does an interruptible sleep.

What's needed is an async, non-sleeeping, interface ... with I/O
overlap.  That's antithetical to using read()/write() calls, so
your proposed approach couldn't possibly work.

haha, wow ok you convinced me :)

I got a bit impatient when I was working on this, it took some time
just to figure out the intention of the code, and I'm trying to hold
to a bit of a schedule here.  Without any clear (to me) reason, I
didn't want to spend a lot of effort fixing this up.

There's really no big difference between the usb drivers here and the
disk I/O scheduler queue, AFAICT, so it seems like the solution I want
is to do a kmap() on the user buffer and then do the I/O straight out
of that.  That will eliminate the need for the bounce buffer.  I'll
post a new version along with the iodesc changes later this week.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops

2007-01-16 Thread Nate Diller


On 1/15/07, David Brownell [EMAIL PROTECTED] wrote:

On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
 This removes the aio implementation from the usb gadget file system.

NAK.  I see a deep mis-understanding here.


 Aside
 from making very creative (!) use of the aio retry path, it can't be of any
 use performance-wise

Other than the basic win of letting one userspace thread keep an I/O
stream active while at the same time processing the data it reads or
writes??  That's the async part of AIO.

There's a not-so-little thing called I/O overlap ... which is the only
way to prevent wasting bandwidth between (non-cacheable) I/O requests,
and thus is the only way to let userspace code achieve anything close
to the maximum I/O bandwidth the hardware can achieve.

We want to see the host side usbfs evolve to support AIO like this
too, for the same reasons.  (Currently it has fairly ugly AIO code
that looks unlike any other AIO code in Linux.  Recent updates to
support a file-per-endpoint device model are a necessary precursor
to switching over to standard AIO syscalls.)


 because it always kmalloc()s a bounce buffer for the
 *whole* I/O size.

By and large that's a negligible factor compared to being able to
achieve I/O overlap.  ISTR the reason for not doing fancy DMA magic
was that the cost of this style AIO was under 1 KByte object code
on ARM, which was easy to justify ... while DMA magic to do that
sort of stuff would be much fatter, as well as more error prone.

(And that's why the creative use of the retry path.  As I've
observed before, retry is a misnomer in the general sense of
an async I/O framework.  It's more of a semi-completion callback;
I/O can't in general be retried on error or fault, and even in
the current usage it's not really a retry.)


Now that high speed peripheral hardware is becoming more common on
embedded Linuxes -- TI has DaVinci, OMAP 2430, TUSB6010 (as found
in the new Nokia 800 tablets); Atmel AVR32 AP7000; at least a couple
parts that should be able to use the same musb_hdrc driver as those
TI parts; and a few other chips I've heard of -- there may be some
virtue in eliminating the memcpy, since those CPUs don't have many
MIPS to waste.  (Iff the memcpy turns out to be a real issue...)


 Perhaps the only reason to keep it around is the ability
 to cancel I/O requests, which only applies when using the user space async
 I/O interface.

It's good to have almost the complete kernel API functionality
exposed to userspace, and having I/O cancelation is an inevitable
consequence of a complete AIO framework ... but that particular
issue was not a driving concern.


The reason for AIO is to have a *STANDARD* userspace interface
for *ASYNC I/O* which otherwise can't exist.  You know, the kind
of I/O interface that can't be implemented with read() and write()
syscalls, which for non-buffered I/O necessarily preclude all I/O
overlap.  AIO itself is a direct match to most I/O frameworks'
primitives.  (AIOCB being directly analagous to peripheral side
struct usb_request and host side struct urb.)


You know, I've always thought that one reason the AIO discussions
seemed strange is that they weren't really focussed on I/O (the
lowlevel after-the-caches stuff) so much as filesystems (several
layers up in the stack, with intervening caching frameworks).

The first several implementations of AIO that I saw were restricted
to real I/O and not applicable to disk backed files.  So while I
was glad the Linux approach didn't make that mistake, it's seemed
that it might be wanting to make a converse mistake: neglecting I/O
that isn't aimed at data stored on disks.


 I highly doubt that is enough incentive to justify the extra
 complexity here or in user-space, so I think it's a safe bet to remove this.
 If that feature still desired, it would be possible to implement a sync
 interface that does an interruptible sleep.

What's needed is an async, non-sleeeping, interface ... with I/O
overlap.  That's antithetical to using read()/write() calls, so
your proposed approach couldn't possibly work.


haha, wow ok you convinced me :)

I got a bit impatient when I was working on this, it took some time
just to figure out the intention of the code, and I'm trying to hold
to a bit of a schedule here.  Without any clear (to me) reason, I
didn't want to spend a lot of effort fixing this up.

There's really no big difference between the usb drivers here and the
disk I/O scheduler queue, AFAICT, so it seems like the solution I want
is to do a kmap() on the user buffer and then do the I/O straight out
of that.  That will eliminate the need for the bounce buffer.  I'll
post a new version along with the iodesc changes later this week.

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 4/10][RFC] aio: convert aio_complete to file_endio_t

2007-01-16 Thread Nate Diller


On 1/15/07, David Brownell [EMAIL PROTECTED] wrote:

On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
 --- a/drivers/usb/gadget/inode.c  2007-01-12 14:42:29.0 -0800
 +++ b/drivers/usb/gadget/inode.c  2007-01-12 14:25:34.0 -0800
 @@ -559,35 +559,32 @@ static int ep_aio_cancel(struct kiocb *i
   return value;
  }

 -static ssize_t ep_aio_read_retry(struct kiocb *iocb)
 +static int ep_aio_read_retry(struct kiocb *iocb)
  {
   struct kiocb_priv   *priv = iocb-private;
 - ssize_t len, total;
 - int i;
 + ssize_t total;
 + int i, err = 0;

   /* we retry to get the right mm context for this: */

   /* copy stuff into user buffers */
   total = priv-actual;
 - len = 0;
   for (i=0; i  priv-nr_segs; i++) {
   ssize_t this = min((ssize_t)(priv-iv[i].iov_len), total);

   if (copy_to_user(priv-iv[i].iov_base, priv-buf, this)) {
 - if (len == 0)
 - len = -EFAULT;
 + err = -EFAULT;

Discarding the capability to report partial success, e.g. that the first N
bytes were properly transferred?  I don't see any virtue in that change.
Quite the opposite in fact.

I think you're also expecting that if N bytes were requested, that's always
how many will be received.  That's not true for packetized I/O such as USB
isochronous transfers ... where it's quite legit (and in some cases routine)
for the other end to send packets that are shorter than the maximum allowed.
Sending a zero length packet is not the same as sending no packet at all,
for another example.


I will convert this (usb) code to use the standard completion path,
which you will notice *gained* the ability to properly report both an
error and a partial success as part of this patch.  In fact, fixing
this up was my intention when I wrote this patch, and the later patch
was a compromise intended to get this whole bundle out for review in a
timely manner :)

NATE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

2007-01-15 Thread Nate Diller


On 1/15/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote:

On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> Convert code using iocb->ki_left to use the more generic iov_length() call.

No way.  We need to reduce the numer of iovec traversals, not adding
more of them.


ok, I can work on a version of this that uses struct iodesc.  Maybe
something like this?

struct iodesc {
   struct iovec *iov;
   unsigned long nr_segs;
   size_t nbytes;
};

I suppose it's worth doing the iodesc thing along with this patchset
anyway, since it'll avoid an extra round of interface churn.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-15 Thread Nate Diller

On 1/15/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote:

On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> This series is an attempt to generalize the async I/O paths to be
> implementation agnostic.  It completely eliminates knowledge of
> the kiocb structure in the generic code and makes it private within the
> current aio code.  Things get noticeably cleaner without that layering
> violation.
>
> The new interface takes a file_endio_t function pointer, and a private data
> pointer, which would normally be aio_complete and a kiocb pointer,
> respectively.  If the aio submission function gets back EIOCBQUEUED, that is
> a guarantee that the endio function will be called, or *already has been
> called*.  If the file_endio_t pointer provided to aio_[read|write] is NULL,
> the FS must block on I/O completion, then return either the number of bytes
> read, or an error.

I don't really like this patchet at all.  At some point it's a lot nicer
to have a lot of paramaters that are related and passed down a long
callchain into a structure, and I think the aio code is over that threshold.
The completion function cleanups look okay to me, but I'd rather add
that completion function to struct kiocb instead of removing kiocb use.

I have this slight feeling you want to use this completions for something
else than the current aio code, if that's the case it would help
if you could explain briefly in what direction your heading.

Actually I agree with you more than you might think.  I had intended
this to mesh with your struct iodesc idea, where iodesc would contain
the iovec pointer, nr_segs, iov_length, and whatever else needs to be
there, potentially even the endio function and its private data, tying
those to the iovec instead of a separate structure that needs to be
kept in sync.  There's a distinct layering that should exist between
things that should accompany the iovec transparently, and private data
that should be attached opaquely by layers above.

The biggest thing I have in mind for this patch, actually, is to fix
up the *sync* paths.  I don't think we should be waiting on sync I/O
at the *top* of the call stack, like with wait_on_sync_kiocb(), I'd
say the best place to wait is at the *bottom*, down in the I/O
scheduler.  This would make it a lot easier to clean up the completion
paths, because in the sync case, you'd be right back in process
context again as you traverse upward through the RAID, encryption,
loopback, directIO, FS log commit, etc.  It doesn't by itself
eliminate the need for all the threads and workqueues and such that
those layers each own, but it is a step in the right direction.

Now if you want to talk about long-term vaporware style ideas, yeah, I
do have my own thoughts on how aio should work.  And from Agami's
perspective, this patch also makes it easier for us to do certain
debugging traces that we wish to hack together, in order to profile
performance on our platform.  But I'd be hesitant to make those
arguments, cause they are largely irrelevant (we can obviously carry
the patch for debugging without buy-in from the community).  This is
the right thing to do from a design perspective.  Hopefully it enables
a new architecture that can reduce context switches in I/O completion,
and reduce overhead.  That's the real motive ;)

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 148 matches

Mail list logo