Re: [Patch] invalidate range of pages after direct IO write

2005-02-07 Thread Zach Brown
>> But this won't happen if next
>>started as 0 and we didn't update it.  I don't know if retrying is the
>>intended behaviour or if we care that the start == 0 case doesn't do it.
> 
> 
> Good point.  Let's make it explicit?

Looks great.  I briefly had visions of some bitfield to pack the three
boolean ints we have and then quickly came to my senses. :)

I threw together those other two patches that work with ranges around
direct IO.  (unmaping before r/w and writing and waiting before reads).
 rc3-mm1 is angry with my test machine so they're actually against
current -bk with this first invalidation patch applied.  I hope that
doesn't make life harder than it needs to be.  I'll send them under
seperate cover.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-07 Thread Zach Brown
 But this won't happen if next
started as 0 and we didn't update it.  I don't know if retrying is the
intended behaviour or if we care that the start == 0 case doesn't do it.
 
 
 Good point.  Let's make it explicit?

Looks great.  I briefly had visions of some bitfield to pack the three
boolean ints we have and then quickly came to my senses. :)

I threw together those other two patches that work with ranges around
direct IO.  (unmaping before r/w and writing and waiting before reads).
 rc3-mm1 is angry with my test machine so they're actually against
current -bk with this first invalidation patch applied.  I hope that
doesn't make life harder than it needs to be.  I'll send them under
seperate cover.

- z
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-04 Thread Andrew Morton
Zach Brown <[EMAIL PROTECTED]> wrote:
>
> Andrew Morton wrote:
> 
> > I'd be inclined to hold off on the macro until we actually get the
> > open-coded stuff right..  Sometimes the page lookup loops take an end+1
> > argument and sometimes they take an inclusive `end'.  I think.  That might
> > make it a bit messy.
> > 
> > How's this look?
> 
> Looks good.  It certainly stops the worst behaviour we were worried
> about.  I wonder about 2 things:
> 
> 1) Now that we're calculating a specifc length for pagevec_lookup(), is
> testing that page->index > end still needed for pages that get remapped
> somewhere weird before we locked?  If it is, I imagine we should test
> for < start as well.

Nope.  We're guaranteed that the pages which pagevec_lookup() returned are

a) at or beyond `start' and

b) in ascending pgoff_t order, although not necessarily contiguous. 
   There will be gaps for not-present pages.  It's just an in-order gather.

So we need the `page->index > end' test to cope with the situation where a
request for three pages returned the three pages at indices 10, 11 and
3,000,000.

> 2) If we get a pagevec full of pages that fail the != mapping test we
> will probably just try again, not having changed next.  This is fine, we
> won't find them in our mapping again.

Yes, good point.  That page should go away once we drop our ref, and
it's not in the radix tree any more.

>  But this won't happen if next
> started as 0 and we didn't update it.  I don't know if retrying is the
> intended behaviour or if we care that the start == 0 case doesn't do it.

Good point.  Let's make it explicit?

--- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix-fix
Fri Feb  4 15:33:52 2005
+++ 25-akpm/mm/truncate.c   Fri Feb  4 15:34:47 2005
@@ -260,11 +260,12 @@ int invalidate_inode_pages2_range(struct
int i;
int ret = 0;
int did_range_unmap = 0;
+   int wrapped = 0;
 
pagevec_init(, 0);
next = start;
-   while (next <= end &&
-  !ret && pagevec_lookup(, mapping, next,
+   while (next <= end && !ret && !wrapped &&
+   pagevec_lookup(, mapping, next,
min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
for (i = 0; !ret && i < pagevec_count(); i++) {
struct page *page = pvec.pages[i];
@@ -277,6 +278,8 @@ int invalidate_inode_pages2_range(struct
}
wait_on_page_writeback(page);
next = page->index + 1;
+   if (next == 0)
+   wrapped = 1;
while (page_mapped(page)) {
if (!did_range_unmap) {
/*
@@ -307,8 +310,6 @@ int invalidate_inode_pages2_range(struct
}
pagevec_release();
cond_resched();
-   if (next == 0)
-   break;  /* The pgoff_t wrapped */
}
return ret;
 }
_

> I'm being less excited by the iterating macro the more I think about it.
>Tearing down the pagevec in some complicated for(;;) doesn't sound
> reliable and I fear that some function that takes a per-page callback
> function pointer from the caller will turn people's stomachs.

heh, OK.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-04 Thread Zach Brown
Andrew Morton wrote:

> I'd be inclined to hold off on the macro until we actually get the
> open-coded stuff right..  Sometimes the page lookup loops take an end+1
> argument and sometimes they take an inclusive `end'.  I think.  That might
> make it a bit messy.
> 
> How's this look?

Looks good.  It certainly stops the worst behaviour we were worried
about.  I wonder about 2 things:

1) Now that we're calculating a specifc length for pagevec_lookup(), is
testing that page->index > end still needed for pages that get remapped
somewhere weird before we locked?  If it is, I imagine we should test
for < start as well.

2) If we get a pagevec full of pages that fail the != mapping test we
will probably just try again, not having changed next.  This is fine, we
won't find them in our mapping again.  But this won't happen if next
started as 0 and we didn't update it.  I don't know if retrying is the
intended behaviour or if we care that the start == 0 case doesn't do it.

I'm being less excited by the iterating macro the more I think about it.
   Tearing down the pagevec in some complicated for(;;) doesn't sound
reliable and I fear that some function that takes a per-page callback
function pointer from the caller will turn people's stomachs.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-04 Thread Zach Brown
Andrew Morton wrote:

 I'd be inclined to hold off on the macro until we actually get the
 open-coded stuff right..  Sometimes the page lookup loops take an end+1
 argument and sometimes they take an inclusive `end'.  I think.  That might
 make it a bit messy.
 
 How's this look?

Looks good.  It certainly stops the worst behaviour we were worried
about.  I wonder about 2 things:

1) Now that we're calculating a specifc length for pagevec_lookup(), is
testing that page-index  end still needed for pages that get remapped
somewhere weird before we locked?  If it is, I imagine we should test
for  start as well.

2) If we get a pagevec full of pages that fail the != mapping test we
will probably just try again, not having changed next.  This is fine, we
won't find them in our mapping again.  But this won't happen if next
started as 0 and we didn't update it.  I don't know if retrying is the
intended behaviour or if we care that the start == 0 case doesn't do it.

I'm being less excited by the iterating macro the more I think about it.
   Tearing down the pagevec in some complicated for(;;) doesn't sound
reliable and I fear that some function that takes a per-page callback
function pointer from the caller will turn people's stomachs.

- z
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-04 Thread Andrew Morton
Zach Brown [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
 
  I'd be inclined to hold off on the macro until we actually get the
  open-coded stuff right..  Sometimes the page lookup loops take an end+1
  argument and sometimes they take an inclusive `end'.  I think.  That might
  make it a bit messy.
  
  How's this look?
 
 Looks good.  It certainly stops the worst behaviour we were worried
 about.  I wonder about 2 things:
 
 1) Now that we're calculating a specifc length for pagevec_lookup(), is
 testing that page-index  end still needed for pages that get remapped
 somewhere weird before we locked?  If it is, I imagine we should test
 for  start as well.

Nope.  We're guaranteed that the pages which pagevec_lookup() returned are

a) at or beyond `start' and

b) in ascending pgoff_t order, although not necessarily contiguous. 
   There will be gaps for not-present pages.  It's just an in-order gather.

So we need the `page-index  end' test to cope with the situation where a
request for three pages returned the three pages at indices 10, 11 and
3,000,000.

 2) If we get a pagevec full of pages that fail the != mapping test we
 will probably just try again, not having changed next.  This is fine, we
 won't find them in our mapping again.

Yes, good point.  That page should go away once we drop our ref, and
it's not in the radix tree any more.

  But this won't happen if next
 started as 0 and we didn't update it.  I don't know if retrying is the
 intended behaviour or if we care that the start == 0 case doesn't do it.

Good point.  Let's make it explicit?

--- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix-fix
Fri Feb  4 15:33:52 2005
+++ 25-akpm/mm/truncate.c   Fri Feb  4 15:34:47 2005
@@ -260,11 +260,12 @@ int invalidate_inode_pages2_range(struct
int i;
int ret = 0;
int did_range_unmap = 0;
+   int wrapped = 0;
 
pagevec_init(pvec, 0);
next = start;
-   while (next = end 
-  !ret  pagevec_lookup(pvec, mapping, next,
+   while (next = end  !ret  !wrapped 
+   pagevec_lookup(pvec, mapping, next,
min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
for (i = 0; !ret  i  pagevec_count(pvec); i++) {
struct page *page = pvec.pages[i];
@@ -277,6 +278,8 @@ int invalidate_inode_pages2_range(struct
}
wait_on_page_writeback(page);
next = page-index + 1;
+   if (next == 0)
+   wrapped = 1;
while (page_mapped(page)) {
if (!did_range_unmap) {
/*
@@ -307,8 +310,6 @@ int invalidate_inode_pages2_range(struct
}
pagevec_release(pvec);
cond_resched();
-   if (next == 0)
-   break;  /* The pgoff_t wrapped */
}
return ret;
 }
_

 I'm being less excited by the iterating macro the more I think about it.
Tearing down the pagevec in some complicated for(;;) doesn't sound
 reliable and I fear that some function that takes a per-page callback
 function pointer from the caller will turn people's stomachs.

heh, OK.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-03 Thread Andrew Morton
Zach Brown <[EMAIL PROTECTED]> wrote:
>
> > I'll make that change and plop the patch into -mm, but we need to think
>  > about the infinite-loop problem..
> 
>  I can try hacking together that macro and auditing pagevec_lookup()
>  callers..

I'd be inclined to hold off on the macro until we actually get the
open-coded stuff right..  Sometimes the page lookup loops take an end+1
argument and sometimes they take an inclusive `end'.  I think.  That might
make it a bit messy.

How's this look?



- Don't look up more pages than we're going to use

- Don't test page->index until we've locked the page

- Check for the cursor wrapping at the end of the mapping.

--- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix
2005-02-03 18:20:22.0 -0800
+++ 25-akpm/mm/truncate.c   2005-02-03 18:28:24.627796400 -0800
@@ -264,18 +264,14 @@ int invalidate_inode_pages2_range(struct
pagevec_init(, 0);
next = start;
while (next <= end &&
-  !ret && pagevec_lookup(, mapping, next, PAGEVEC_SIZE)) {
+  !ret && pagevec_lookup(, mapping, next,
+   min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
for (i = 0; !ret && i < pagevec_count(); i++) {
struct page *page = pvec.pages[i];
int was_dirty;
 
-   if (page->index > end) {
-   next = page->index;
-   break;
-   }
-
lock_page(page);
-   if (page->mapping != mapping) { /* truncate race? */
+   if (page->mapping != mapping || page->index > end) {
unlock_page(page);
continue;
}
@@ -311,6 +307,8 @@ int invalidate_inode_pages2_range(struct
}
pagevec_release();
cond_resched();
+   if (next == 0)
+   break;  /* The pgoff_t wrapped */
}
return ret;
 }
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-03 Thread Zach Brown
Andrew Morton wrote:

> Note that the same optimisation should be made in the call to
> unmap_mapping_range() in generic_file_direct_IO().  Currently we try and
> unmap the whole file, even if we're only writing a single byte.  Given that
> you're now calculating iov_length() in there we might as well use that
> number a few lines further up in that function.

Indeed.  I can throw that together.  I also have a patch that introduces
 filemap_write_and_wait_range() and calls it from the read bit of
__blockdev_direct_IO().  It didn't change the cheesy fsx load I was
using and then I got distracted.  I can try harder.

> Reading the code, I'm unable to convince myself that it won't go into an
> infinite loop if it finds a page at page->index = -1.  But I didn't try
> very hard ;)

I'm unconvinced as well. I got the pattern from
truncate_inode_pages_range() and it seems to have a related problem when
end is something that page_index can never be greater than.  It just
starts over.

I wonder if we should add some mapping_for_each_range() (less ridiculous
names welcome :)) macro that handles this crap for the caller who just
works with page pointers.  We could introduce some iterator struct that
the caller would put on the stack and pass in to hold state, something
like the 'n' in list_for_each_safe().  It looks like a few
pagevec_lookup() callers could use this.

> Minor note on this:
> 
>   return invalidate_inode_pages2_range(mapping, 0, ~0UL);
> 
> I just use `-1' there.

Roger.  I was just mimicking invalidate_inode_pages().

> I'll make that change and plop the patch into -mm, but we need to think
> about the infinite-loop problem..

I can try hacking together that macro and auditing pagevec_lookup()
callers..

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-03 Thread Andrew Morton
Zach Brown <[EMAIL PROTECTED]> wrote:
>
> After a direct IO write only invalidate the pages that the write intersected.
> invalidate_inode_pages2_range(mapping, pgoff start, pgoff end) is added and
> called from generic_file_direct_IO().  This doesn't break some subtle 
> agreement
> with some other part of the code, does it?

It should be OK.

Note that the same optimisation should be made in the call to
unmap_mapping_range() in generic_file_direct_IO().  Currently we try and
unmap the whole file, even if we're only writing a single byte.  Given that
you're now calculating iov_length() in there we might as well use that
number a few lines further up in that function.

Note that invalidate_inode_pages[_range2] also does the unmapping thing -
in the direct-io case we don't expect that path th ever trigger: the only
way we'll find mapped pages here is if someone raced and faulted a page
back in.

Reading the code, I'm unable to convince myself that it won't go into an
infinite loop if it finds a page at page->index = -1.  But I didn't try
very hard ;)

Minor note on this:

return invalidate_inode_pages2_range(mapping, 0, ~0UL);

I just use `-1' there.  We don't _know_ that pgoff_t is an unsigned long. 
Some smarty may come along one day and make it unsigned long long, in which
case the code will break.  Using -1 here just works everywhere.

I'll make that change and plop the patch into -mm, but we need to think
about the infinite-loop problem..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-03 Thread Andrew Morton
Zach Brown [EMAIL PROTECTED] wrote:

 After a direct IO write only invalidate the pages that the write intersected.
 invalidate_inode_pages2_range(mapping, pgoff start, pgoff end) is added and
 called from generic_file_direct_IO().  This doesn't break some subtle 
 agreement
 with some other part of the code, does it?

It should be OK.

Note that the same optimisation should be made in the call to
unmap_mapping_range() in generic_file_direct_IO().  Currently we try and
unmap the whole file, even if we're only writing a single byte.  Given that
you're now calculating iov_length() in there we might as well use that
number a few lines further up in that function.

Note that invalidate_inode_pages[_range2] also does the unmapping thing -
in the direct-io case we don't expect that path th ever trigger: the only
way we'll find mapped pages here is if someone raced and faulted a page
back in.

Reading the code, I'm unable to convince myself that it won't go into an
infinite loop if it finds a page at page-index = -1.  But I didn't try
very hard ;)

Minor note on this:

return invalidate_inode_pages2_range(mapping, 0, ~0UL);

I just use `-1' there.  We don't _know_ that pgoff_t is an unsigned long. 
Some smarty may come along one day and make it unsigned long long, in which
case the code will break.  Using -1 here just works everywhere.

I'll make that change and plop the patch into -mm, but we need to think
about the infinite-loop problem..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-03 Thread Zach Brown
Andrew Morton wrote:

 Note that the same optimisation should be made in the call to
 unmap_mapping_range() in generic_file_direct_IO().  Currently we try and
 unmap the whole file, even if we're only writing a single byte.  Given that
 you're now calculating iov_length() in there we might as well use that
 number a few lines further up in that function.

Indeed.  I can throw that together.  I also have a patch that introduces
 filemap_write_and_wait_range() and calls it from the read bit of
__blockdev_direct_IO().  It didn't change the cheesy fsx load I was
using and then I got distracted.  I can try harder.

 Reading the code, I'm unable to convince myself that it won't go into an
 infinite loop if it finds a page at page-index = -1.  But I didn't try
 very hard ;)

I'm unconvinced as well. I got the pattern from
truncate_inode_pages_range() and it seems to have a related problem when
end is something that page_index can never be greater than.  It just
starts over.

I wonder if we should add some mapping_for_each_range() (less ridiculous
names welcome :)) macro that handles this crap for the caller who just
works with page pointers.  We could introduce some iterator struct that
the caller would put on the stack and pass in to hold state, something
like the 'n' in list_for_each_safe().  It looks like a few
pagevec_lookup() callers could use this.

 Minor note on this:
 
   return invalidate_inode_pages2_range(mapping, 0, ~0UL);
 
 I just use `-1' there.

Roger.  I was just mimicking invalidate_inode_pages().

 I'll make that change and plop the patch into -mm, but we need to think
 about the infinite-loop problem..

I can try hacking together that macro and auditing pagevec_lookup()
callers..

- z
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] invalidate range of pages after direct IO write

2005-02-03 Thread Andrew Morton
Zach Brown [EMAIL PROTECTED] wrote:

  I'll make that change and plop the patch into -mm, but we need to think
   about the infinite-loop problem..
 
  I can try hacking together that macro and auditing pagevec_lookup()
  callers..

I'd be inclined to hold off on the macro until we actually get the
open-coded stuff right..  Sometimes the page lookup loops take an end+1
argument and sometimes they take an inclusive `end'.  I think.  That might
make it a bit messy.

How's this look?



- Don't look up more pages than we're going to use

- Don't test page-index until we've locked the page

- Check for the cursor wrapping at the end of the mapping.

--- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix
2005-02-03 18:20:22.0 -0800
+++ 25-akpm/mm/truncate.c   2005-02-03 18:28:24.627796400 -0800
@@ -264,18 +264,14 @@ int invalidate_inode_pages2_range(struct
pagevec_init(pvec, 0);
next = start;
while (next = end 
-  !ret  pagevec_lookup(pvec, mapping, next, PAGEVEC_SIZE)) {
+  !ret  pagevec_lookup(pvec, mapping, next,
+   min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
for (i = 0; !ret  i  pagevec_count(pvec); i++) {
struct page *page = pvec.pages[i];
int was_dirty;
 
-   if (page-index  end) {
-   next = page-index;
-   break;
-   }
-
lock_page(page);
-   if (page-mapping != mapping) { /* truncate race? */
+   if (page-mapping != mapping || page-index  end) {
unlock_page(page);
continue;
}
@@ -311,6 +307,8 @@ int invalidate_inode_pages2_range(struct
}
pagevec_release(pvec);
cond_resched();
+   if (next == 0)
+   break;  /* The pgoff_t wrapped */
}
return ret;
 }
_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/