Re: [Patch] invalidate range of pages after direct IO write
>> But this won't happen if next >>started as 0 and we didn't update it. I don't know if retrying is the >>intended behaviour or if we care that the start == 0 case doesn't do it. > > > Good point. Let's make it explicit? Looks great. I briefly had visions of some bitfield to pack the three boolean ints we have and then quickly came to my senses. :) I threw together those other two patches that work with ranges around direct IO. (unmaping before r/w and writing and waiting before reads). rc3-mm1 is angry with my test machine so they're actually against current -bk with this first invalidation patch applied. I hope that doesn't make life harder than it needs to be. I'll send them under seperate cover. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
But this won't happen if next started as 0 and we didn't update it. I don't know if retrying is the intended behaviour or if we care that the start == 0 case doesn't do it. Good point. Let's make it explicit? Looks great. I briefly had visions of some bitfield to pack the three boolean ints we have and then quickly came to my senses. :) I threw together those other two patches that work with ranges around direct IO. (unmaping before r/w and writing and waiting before reads). rc3-mm1 is angry with my test machine so they're actually against current -bk with this first invalidation patch applied. I hope that doesn't make life harder than it needs to be. I'll send them under seperate cover. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Zach Brown <[EMAIL PROTECTED]> wrote: > > Andrew Morton wrote: > > > I'd be inclined to hold off on the macro until we actually get the > > open-coded stuff right.. Sometimes the page lookup loops take an end+1 > > argument and sometimes they take an inclusive `end'. I think. That might > > make it a bit messy. > > > > How's this look? > > Looks good. It certainly stops the worst behaviour we were worried > about. I wonder about 2 things: > > 1) Now that we're calculating a specifc length for pagevec_lookup(), is > testing that page->index > end still needed for pages that get remapped > somewhere weird before we locked? If it is, I imagine we should test > for < start as well. Nope. We're guaranteed that the pages which pagevec_lookup() returned are a) at or beyond `start' and b) in ascending pgoff_t order, although not necessarily contiguous. There will be gaps for not-present pages. It's just an in-order gather. So we need the `page->index > end' test to cope with the situation where a request for three pages returned the three pages at indices 10, 11 and 3,000,000. > 2) If we get a pagevec full of pages that fail the != mapping test we > will probably just try again, not having changed next. This is fine, we > won't find them in our mapping again. Yes, good point. That page should go away once we drop our ref, and it's not in the radix tree any more. > But this won't happen if next > started as 0 and we didn't update it. I don't know if retrying is the > intended behaviour or if we care that the start == 0 case doesn't do it. Good point. Let's make it explicit? --- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix-fix Fri Feb 4 15:33:52 2005 +++ 25-akpm/mm/truncate.c Fri Feb 4 15:34:47 2005 @@ -260,11 +260,12 @@ int invalidate_inode_pages2_range(struct int i; int ret = 0; int did_range_unmap = 0; + int wrapped = 0; pagevec_init(, 0); next = start; - while (next <= end && - !ret && pagevec_lookup(, mapping, next, + while (next <= end && !ret && !wrapped && + pagevec_lookup(, mapping, next, min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { for (i = 0; !ret && i < pagevec_count(); i++) { struct page *page = pvec.pages[i]; @@ -277,6 +278,8 @@ int invalidate_inode_pages2_range(struct } wait_on_page_writeback(page); next = page->index + 1; + if (next == 0) + wrapped = 1; while (page_mapped(page)) { if (!did_range_unmap) { /* @@ -307,8 +310,6 @@ int invalidate_inode_pages2_range(struct } pagevec_release(); cond_resched(); - if (next == 0) - break; /* The pgoff_t wrapped */ } return ret; } _ > I'm being less excited by the iterating macro the more I think about it. >Tearing down the pagevec in some complicated for(;;) doesn't sound > reliable and I fear that some function that takes a per-page callback > function pointer from the caller will turn people's stomachs. heh, OK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Andrew Morton wrote: > I'd be inclined to hold off on the macro until we actually get the > open-coded stuff right.. Sometimes the page lookup loops take an end+1 > argument and sometimes they take an inclusive `end'. I think. That might > make it a bit messy. > > How's this look? Looks good. It certainly stops the worst behaviour we were worried about. I wonder about 2 things: 1) Now that we're calculating a specifc length for pagevec_lookup(), is testing that page->index > end still needed for pages that get remapped somewhere weird before we locked? If it is, I imagine we should test for < start as well. 2) If we get a pagevec full of pages that fail the != mapping test we will probably just try again, not having changed next. This is fine, we won't find them in our mapping again. But this won't happen if next started as 0 and we didn't update it. I don't know if retrying is the intended behaviour or if we care that the start == 0 case doesn't do it. I'm being less excited by the iterating macro the more I think about it. Tearing down the pagevec in some complicated for(;;) doesn't sound reliable and I fear that some function that takes a per-page callback function pointer from the caller will turn people's stomachs. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Andrew Morton wrote: I'd be inclined to hold off on the macro until we actually get the open-coded stuff right.. Sometimes the page lookup loops take an end+1 argument and sometimes they take an inclusive `end'. I think. That might make it a bit messy. How's this look? Looks good. It certainly stops the worst behaviour we were worried about. I wonder about 2 things: 1) Now that we're calculating a specifc length for pagevec_lookup(), is testing that page-index end still needed for pages that get remapped somewhere weird before we locked? If it is, I imagine we should test for start as well. 2) If we get a pagevec full of pages that fail the != mapping test we will probably just try again, not having changed next. This is fine, we won't find them in our mapping again. But this won't happen if next started as 0 and we didn't update it. I don't know if retrying is the intended behaviour or if we care that the start == 0 case doesn't do it. I'm being less excited by the iterating macro the more I think about it. Tearing down the pagevec in some complicated for(;;) doesn't sound reliable and I fear that some function that takes a per-page callback function pointer from the caller will turn people's stomachs. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Zach Brown [EMAIL PROTECTED] wrote: Andrew Morton wrote: I'd be inclined to hold off on the macro until we actually get the open-coded stuff right.. Sometimes the page lookup loops take an end+1 argument and sometimes they take an inclusive `end'. I think. That might make it a bit messy. How's this look? Looks good. It certainly stops the worst behaviour we were worried about. I wonder about 2 things: 1) Now that we're calculating a specifc length for pagevec_lookup(), is testing that page-index end still needed for pages that get remapped somewhere weird before we locked? If it is, I imagine we should test for start as well. Nope. We're guaranteed that the pages which pagevec_lookup() returned are a) at or beyond `start' and b) in ascending pgoff_t order, although not necessarily contiguous. There will be gaps for not-present pages. It's just an in-order gather. So we need the `page-index end' test to cope with the situation where a request for three pages returned the three pages at indices 10, 11 and 3,000,000. 2) If we get a pagevec full of pages that fail the != mapping test we will probably just try again, not having changed next. This is fine, we won't find them in our mapping again. Yes, good point. That page should go away once we drop our ref, and it's not in the radix tree any more. But this won't happen if next started as 0 and we didn't update it. I don't know if retrying is the intended behaviour or if we care that the start == 0 case doesn't do it. Good point. Let's make it explicit? --- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix-fix Fri Feb 4 15:33:52 2005 +++ 25-akpm/mm/truncate.c Fri Feb 4 15:34:47 2005 @@ -260,11 +260,12 @@ int invalidate_inode_pages2_range(struct int i; int ret = 0; int did_range_unmap = 0; + int wrapped = 0; pagevec_init(pvec, 0); next = start; - while (next = end - !ret pagevec_lookup(pvec, mapping, next, + while (next = end !ret !wrapped + pagevec_lookup(pvec, mapping, next, min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { for (i = 0; !ret i pagevec_count(pvec); i++) { struct page *page = pvec.pages[i]; @@ -277,6 +278,8 @@ int invalidate_inode_pages2_range(struct } wait_on_page_writeback(page); next = page-index + 1; + if (next == 0) + wrapped = 1; while (page_mapped(page)) { if (!did_range_unmap) { /* @@ -307,8 +310,6 @@ int invalidate_inode_pages2_range(struct } pagevec_release(pvec); cond_resched(); - if (next == 0) - break; /* The pgoff_t wrapped */ } return ret; } _ I'm being less excited by the iterating macro the more I think about it. Tearing down the pagevec in some complicated for(;;) doesn't sound reliable and I fear that some function that takes a per-page callback function pointer from the caller will turn people's stomachs. heh, OK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Zach Brown <[EMAIL PROTECTED]> wrote: > > > I'll make that change and plop the patch into -mm, but we need to think > > about the infinite-loop problem.. > > I can try hacking together that macro and auditing pagevec_lookup() > callers.. I'd be inclined to hold off on the macro until we actually get the open-coded stuff right.. Sometimes the page lookup loops take an end+1 argument and sometimes they take an inclusive `end'. I think. That might make it a bit messy. How's this look? - Don't look up more pages than we're going to use - Don't test page->index until we've locked the page - Check for the cursor wrapping at the end of the mapping. --- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix 2005-02-03 18:20:22.0 -0800 +++ 25-akpm/mm/truncate.c 2005-02-03 18:28:24.627796400 -0800 @@ -264,18 +264,14 @@ int invalidate_inode_pages2_range(struct pagevec_init(, 0); next = start; while (next <= end && - !ret && pagevec_lookup(, mapping, next, PAGEVEC_SIZE)) { + !ret && pagevec_lookup(, mapping, next, + min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { for (i = 0; !ret && i < pagevec_count(); i++) { struct page *page = pvec.pages[i]; int was_dirty; - if (page->index > end) { - next = page->index; - break; - } - lock_page(page); - if (page->mapping != mapping) { /* truncate race? */ + if (page->mapping != mapping || page->index > end) { unlock_page(page); continue; } @@ -311,6 +307,8 @@ int invalidate_inode_pages2_range(struct } pagevec_release(); cond_resched(); + if (next == 0) + break; /* The pgoff_t wrapped */ } return ret; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Andrew Morton wrote: > Note that the same optimisation should be made in the call to > unmap_mapping_range() in generic_file_direct_IO(). Currently we try and > unmap the whole file, even if we're only writing a single byte. Given that > you're now calculating iov_length() in there we might as well use that > number a few lines further up in that function. Indeed. I can throw that together. I also have a patch that introduces filemap_write_and_wait_range() and calls it from the read bit of __blockdev_direct_IO(). It didn't change the cheesy fsx load I was using and then I got distracted. I can try harder. > Reading the code, I'm unable to convince myself that it won't go into an > infinite loop if it finds a page at page->index = -1. But I didn't try > very hard ;) I'm unconvinced as well. I got the pattern from truncate_inode_pages_range() and it seems to have a related problem when end is something that page_index can never be greater than. It just starts over. I wonder if we should add some mapping_for_each_range() (less ridiculous names welcome :)) macro that handles this crap for the caller who just works with page pointers. We could introduce some iterator struct that the caller would put on the stack and pass in to hold state, something like the 'n' in list_for_each_safe(). It looks like a few pagevec_lookup() callers could use this. > Minor note on this: > > return invalidate_inode_pages2_range(mapping, 0, ~0UL); > > I just use `-1' there. Roger. I was just mimicking invalidate_inode_pages(). > I'll make that change and plop the patch into -mm, but we need to think > about the infinite-loop problem.. I can try hacking together that macro and auditing pagevec_lookup() callers.. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Zach Brown <[EMAIL PROTECTED]> wrote: > > After a direct IO write only invalidate the pages that the write intersected. > invalidate_inode_pages2_range(mapping, pgoff start, pgoff end) is added and > called from generic_file_direct_IO(). This doesn't break some subtle > agreement > with some other part of the code, does it? It should be OK. Note that the same optimisation should be made in the call to unmap_mapping_range() in generic_file_direct_IO(). Currently we try and unmap the whole file, even if we're only writing a single byte. Given that you're now calculating iov_length() in there we might as well use that number a few lines further up in that function. Note that invalidate_inode_pages[_range2] also does the unmapping thing - in the direct-io case we don't expect that path th ever trigger: the only way we'll find mapped pages here is if someone raced and faulted a page back in. Reading the code, I'm unable to convince myself that it won't go into an infinite loop if it finds a page at page->index = -1. But I didn't try very hard ;) Minor note on this: return invalidate_inode_pages2_range(mapping, 0, ~0UL); I just use `-1' there. We don't _know_ that pgoff_t is an unsigned long. Some smarty may come along one day and make it unsigned long long, in which case the code will break. Using -1 here just works everywhere. I'll make that change and plop the patch into -mm, but we need to think about the infinite-loop problem.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Zach Brown [EMAIL PROTECTED] wrote: After a direct IO write only invalidate the pages that the write intersected. invalidate_inode_pages2_range(mapping, pgoff start, pgoff end) is added and called from generic_file_direct_IO(). This doesn't break some subtle agreement with some other part of the code, does it? It should be OK. Note that the same optimisation should be made in the call to unmap_mapping_range() in generic_file_direct_IO(). Currently we try and unmap the whole file, even if we're only writing a single byte. Given that you're now calculating iov_length() in there we might as well use that number a few lines further up in that function. Note that invalidate_inode_pages[_range2] also does the unmapping thing - in the direct-io case we don't expect that path th ever trigger: the only way we'll find mapped pages here is if someone raced and faulted a page back in. Reading the code, I'm unable to convince myself that it won't go into an infinite loop if it finds a page at page-index = -1. But I didn't try very hard ;) Minor note on this: return invalidate_inode_pages2_range(mapping, 0, ~0UL); I just use `-1' there. We don't _know_ that pgoff_t is an unsigned long. Some smarty may come along one day and make it unsigned long long, in which case the code will break. Using -1 here just works everywhere. I'll make that change and plop the patch into -mm, but we need to think about the infinite-loop problem.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Andrew Morton wrote: Note that the same optimisation should be made in the call to unmap_mapping_range() in generic_file_direct_IO(). Currently we try and unmap the whole file, even if we're only writing a single byte. Given that you're now calculating iov_length() in there we might as well use that number a few lines further up in that function. Indeed. I can throw that together. I also have a patch that introduces filemap_write_and_wait_range() and calls it from the read bit of __blockdev_direct_IO(). It didn't change the cheesy fsx load I was using and then I got distracted. I can try harder. Reading the code, I'm unable to convince myself that it won't go into an infinite loop if it finds a page at page-index = -1. But I didn't try very hard ;) I'm unconvinced as well. I got the pattern from truncate_inode_pages_range() and it seems to have a related problem when end is something that page_index can never be greater than. It just starts over. I wonder if we should add some mapping_for_each_range() (less ridiculous names welcome :)) macro that handles this crap for the caller who just works with page pointers. We could introduce some iterator struct that the caller would put on the stack and pass in to hold state, something like the 'n' in list_for_each_safe(). It looks like a few pagevec_lookup() callers could use this. Minor note on this: return invalidate_inode_pages2_range(mapping, 0, ~0UL); I just use `-1' there. Roger. I was just mimicking invalidate_inode_pages(). I'll make that change and plop the patch into -mm, but we need to think about the infinite-loop problem.. I can try hacking together that macro and auditing pagevec_lookup() callers.. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] invalidate range of pages after direct IO write
Zach Brown [EMAIL PROTECTED] wrote: I'll make that change and plop the patch into -mm, but we need to think about the infinite-loop problem.. I can try hacking together that macro and auditing pagevec_lookup() callers.. I'd be inclined to hold off on the macro until we actually get the open-coded stuff right.. Sometimes the page lookup loops take an end+1 argument and sometimes they take an inclusive `end'. I think. That might make it a bit messy. How's this look? - Don't look up more pages than we're going to use - Don't test page-index until we've locked the page - Check for the cursor wrapping at the end of the mapping. --- 25/mm/truncate.c~invalidate-range-of-pages-after-direct-io-write-fix 2005-02-03 18:20:22.0 -0800 +++ 25-akpm/mm/truncate.c 2005-02-03 18:28:24.627796400 -0800 @@ -264,18 +264,14 @@ int invalidate_inode_pages2_range(struct pagevec_init(pvec, 0); next = start; while (next = end - !ret pagevec_lookup(pvec, mapping, next, PAGEVEC_SIZE)) { + !ret pagevec_lookup(pvec, mapping, next, + min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { for (i = 0; !ret i pagevec_count(pvec); i++) { struct page *page = pvec.pages[i]; int was_dirty; - if (page-index end) { - next = page-index; - break; - } - lock_page(page); - if (page-mapping != mapping) { /* truncate race? */ + if (page-mapping != mapping || page-index end) { unlock_page(page); continue; } @@ -311,6 +307,8 @@ int invalidate_inode_pages2_range(struct } pagevec_release(pvec); cond_resched(); + if (next == 0) + break; /* The pgoff_t wrapped */ } return ret; } _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/