Re: Test12 ll_rw_block error.
On Tue, 19 Dec 2000, Daniel Phillips wrote: > Marcelo Tosatti wrote: > > > > On Mon, 18 Dec 2000, Stephen C. Tweedie wrote: > > > > > Hi, > > > > > > On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: > > > > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: > > > > > > > > Stephen, > > > > > > > > The ->flush() operation (which we've been discussing a bit) would be very > > > > useful now (mainly for XFS). > > > > > > > > At page_launder(), we can call ->flush() if the given page has it defined. > > > > Otherwise use try_to_free_buffers() as we do now for filesystems which > > > > dont care about the special flushing treatment. > > > > > > As of 2.4.0test12, page_launder() will already call the > > > per-address-space writepage() operation for dirty pages. Do you need > > > something similar for clean pages too, or does Linus's new laundry > > > code give you what you need now? > > > > I think the semantics of the filesystem specific ->flush and ->writepage > > are not the same. > > > > Is ok for filesystem specific writepage() code to sync other "physically > > contiguous" dirty pages with reference to the one requested by > > writepage() ? > > > > If so, it can do the same job as the ->flush() idea we've discussing. > > Except that for ->writepage you don't have the option of *not* writing > the specified page. It does. Both the swapper writepage() operation and shm_writepage() cannot be able to write the page. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
In article <[EMAIL PROTECTED]> you wrote: >> I think the semantics of the filesystem specific ->flush and ->writepage >> are not the same. >> >> Is ok for filesystem specific writepage() code to sync other "physically >> contiguous" dirty pages with reference to the one requested by >> writepage() ? >> >> If so, it can do the same job as the ->flush() idea we've discussing. > > Except that for ->writepage you don't have the option of *not* writing > the specified page. In current -test13pre you have. Christoph -- Whip me. Beat me. Make me maintain AIX. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Marcelo Tosatti wrote: > > On Mon, 18 Dec 2000, Stephen C. Tweedie wrote: > > > Hi, > > > > On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: > > > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: > > > > > > Stephen, > > > > > > The ->flush() operation (which we've been discussing a bit) would be very > > > useful now (mainly for XFS). > > > > > > At page_launder(), we can call ->flush() if the given page has it defined. > > > Otherwise use try_to_free_buffers() as we do now for filesystems which > > > dont care about the special flushing treatment. > > > > As of 2.4.0test12, page_launder() will already call the > > per-address-space writepage() operation for dirty pages. Do you need > > something similar for clean pages too, or does Linus's new laundry > > code give you what you need now? > > I think the semantics of the filesystem specific ->flush and ->writepage > are not the same. > > Is ok for filesystem specific writepage() code to sync other "physically > contiguous" dirty pages with reference to the one requested by > writepage() ? > > If so, it can do the same job as the ->flush() idea we've discussing. Except that for ->writepage you don't have the option of *not* writing the specified page. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Mon, 18 Dec 2000, Stephen C. Tweedie wrote: > Hi, > > On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: > > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: > > > > Stephen, > > > > The ->flush() operation (which we've been discussing a bit) would be very > > useful now (mainly for XFS). > > > > At page_launder(), we can call ->flush() if the given page has it defined. > > Otherwise use try_to_free_buffers() as we do now for filesystems which > > dont care about the special flushing treatment. > > As of 2.4.0test12, page_launder() will already call the > per-address-space writepage() operation for dirty pages. Do you need > something similar for clean pages too, or does Linus's new laundry > code give you what you need now? I think the semantics of the filesystem specific ->flush and ->writepage are not the same. Is ok for filesystem specific writepage() code to sync other "physically contiguous" dirty pages with reference to the one requested by writepage() ? If so, it can do the same job as the ->flush() idea we've discussing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Mon, 18 Dec 2000, Stephen C. Tweedie wrote: Hi, On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: Stephen, The -flush() operation (which we've been discussing a bit) would be very useful now (mainly for XFS). At page_launder(), we can call -flush() if the given page has it defined. Otherwise use try_to_free_buffers() as we do now for filesystems which dont care about the special flushing treatment. As of 2.4.0test12, page_launder() will already call the per-address-space writepage() operation for dirty pages. Do you need something similar for clean pages too, or does Linus's new laundry code give you what you need now? I think the semantics of the filesystem specific -flush and -writepage are not the same. Is ok for filesystem specific writepage() code to sync other "physically contiguous" dirty pages with reference to the one requested by writepage() ? If so, it can do the same job as the -flush() idea we've discussing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Marcelo Tosatti wrote: On Mon, 18 Dec 2000, Stephen C. Tweedie wrote: Hi, On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: Stephen, The -flush() operation (which we've been discussing a bit) would be very useful now (mainly for XFS). At page_launder(), we can call -flush() if the given page has it defined. Otherwise use try_to_free_buffers() as we do now for filesystems which dont care about the special flushing treatment. As of 2.4.0test12, page_launder() will already call the per-address-space writepage() operation for dirty pages. Do you need something similar for clean pages too, or does Linus's new laundry code give you what you need now? I think the semantics of the filesystem specific -flush and -writepage are not the same. Is ok for filesystem specific writepage() code to sync other "physically contiguous" dirty pages with reference to the one requested by writepage() ? If so, it can do the same job as the -flush() idea we've discussing. Except that for -writepage you don't have the option of *not* writing the specified page. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
In article [EMAIL PROTECTED] you wrote: I think the semantics of the filesystem specific -flush and -writepage are not the same. Is ok for filesystem specific writepage() code to sync other "physically contiguous" dirty pages with reference to the one requested by writepage() ? If so, it can do the same job as the -flush() idea we've discussing. Except that for -writepage you don't have the option of *not* writing the specified page. In current -test13pre you have. Christoph -- Whip me. Beat me. Make me maintain AIX. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Tue, 19 Dec 2000, Daniel Phillips wrote: Marcelo Tosatti wrote: On Mon, 18 Dec 2000, Stephen C. Tweedie wrote: Hi, On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: Stephen, The -flush() operation (which we've been discussing a bit) would be very useful now (mainly for XFS). At page_launder(), we can call -flush() if the given page has it defined. Otherwise use try_to_free_buffers() as we do now for filesystems which dont care about the special flushing treatment. As of 2.4.0test12, page_launder() will already call the per-address-space writepage() operation for dirty pages. Do you need something similar for clean pages too, or does Linus's new laundry code give you what you need now? I think the semantics of the filesystem specific -flush and -writepage are not the same. Is ok for filesystem specific writepage() code to sync other "physically contiguous" dirty pages with reference to the one requested by writepage() ? If so, it can do the same job as the -flush() idea we've discussing. Except that for -writepage you don't have the option of *not* writing the specified page. It does. Both the swapper writepage() operation and shm_writepage() cannot be able to write the page. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Hi, On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: > > Stephen, > > The ->flush() operation (which we've been discussing a bit) would be very > useful now (mainly for XFS). > > At page_launder(), we can call ->flush() if the given page has it defined. > Otherwise use try_to_free_buffers() as we do now for filesystems which > dont care about the special flushing treatment. As of 2.4.0test12, page_launder() will already call the per-address-space writepage() operation for dirty pages. Do you need something similar for clean pages too, or does Linus's new laundry code give you what you need now? Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Hi, On Sat, Dec 16, 2000 at 07:08:02PM -0600, Russell Cattelan wrote: > > There is a very clean way of doing this with address spaces. It's > > something I would like to see done properly for 2.5: eliminate all > > knowledge of buffer_heads from the VM layer. It would be pretty > > simple to remove page->buffers completely and replace it with a > > page->private pointer, owned by whatever address_space controlled the > > page. Instead of trying to unmap and flush buffers on the page > > directly, these operations would become address_space operations. > > Yes this is a lot of what page buf would like to do eventually. > Have the VM system pressure page_buf for pages which would > then be able to intelligently call the file system to free up cached pages. > A big part of getting Delay Alloc to not completely consume all the > system pages, is being told when it's time to start really allocating disk > space and push pages out. Delayed allocation is actually much easier, since it's entirely an operation on logical page addresses, not physical ones --- by definition you don't have any buffer_heads yet because you haven't decided on the disk blocks. If you're just dealing with pages, not blocks, then the address_space is the natural way of dealing with it already. Only the full semantics of the flush callback have been missing to date, and with 2.4.0-test12 even that is mostly solved, since page_launder will give you the writeback() callbacks you need to flush things to disk when you start getting memory pressure. You can even treat the writepage() as an advisory call. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Hi, On Sat, Dec 16, 2000 at 07:08:02PM -0600, Russell Cattelan wrote: There is a very clean way of doing this with address spaces. It's something I would like to see done properly for 2.5: eliminate all knowledge of buffer_heads from the VM layer. It would be pretty simple to remove page-buffers completely and replace it with a page-private pointer, owned by whatever address_space controlled the page. Instead of trying to unmap and flush buffers on the page directly, these operations would become address_space operations. Yes this is a lot of what page buf would like to do eventually. Have the VM system pressure page_buf for pages which would then be able to intelligently call the file system to free up cached pages. A big part of getting Delay Alloc to not completely consume all the system pages, is being told when it's time to start really allocating disk space and push pages out. Delayed allocation is actually much easier, since it's entirely an operation on logical page addresses, not physical ones --- by definition you don't have any buffer_heads yet because you haven't decided on the disk blocks. If you're just dealing with pages, not blocks, then the address_space is the natural way of dealing with it already. Only the full semantics of the flush callback have been missing to date, and with 2.4.0-test12 even that is mostly solved, since page_launder will give you the writeback() callbacks you need to flush things to disk when you start getting memory pressure. You can even treat the writepage() as an advisory call. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Hi, On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote: On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: Stephen, The -flush() operation (which we've been discussing a bit) would be very useful now (mainly for XFS). At page_launder(), we can call -flush() if the given page has it defined. Otherwise use try_to_free_buffers() as we do now for filesystems which dont care about the special flushing treatment. As of 2.4.0test12, page_launder() will already call the per-address-space writepage() operation for dirty pages. Do you need something similar for clean pages too, or does Linus's new laundry code give you what you need now? Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Sat, 16 Dec 2000, Russell Cattelan wrote: > > > I'm curious about this. > Does the mean reiserFS is doing all of it's own buffer management? > > This would seem a little redundant with what is already in the kernel? > For metadata only reiserfs does its own write management. The buffers come from getblk. We just don't mark the buffers dirty for flushing by flush_dirty_buffers() This has the advantage of avoiding races against bdflush and friends, and makes it easier to keep track of which buffers have actually made their way to disk. It has all of the obvious disadvantages with respect to memory pressure. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Sat, 16 Dec 2000, Russell Cattelan wrote: I'm curious about this. Does the mean reiserFS is doing all of it's own buffer management? This would seem a little redundant with what is already in the kernel? For metadata only reiserfs does its own write management. The buffers come from getblk. We just don't mark the buffers dirty for flushing by flush_dirty_buffers() This has the advantage of avoiding races against bdflush and friends, and makes it easier to keep track of which buffers have actually made their way to disk. It has all of the obvious disadvantages with respect to memory pressure. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: > Hi, > > On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: > > On Thu, 14 Dec 2000, Linus Torvalds wrote: > > > > Just one: any fs that really cares about completion callback is very likely > > to be picky about the requests ordering. So sync_buffers() is very unlikely > > to be useful anyway. > > > > In that sense we really don't have anonymous buffers here. I seriously > > suspect that "unrealistic" assumption is not unrealistic at all. I'm > > not sufficiently familiar with XFS code to say for sure, but... > > Right. ext3 and reiserfs just want to submit their own IOs when it > comes to the journal. (At least in ext3, already-journaled buffers > can be written back by the VM freely.) It's a matter of telling the > fs when that should start. > > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > > That's it. If fs wants unusual completions for requests - let it have its > > own queueing mechanism and submit these requests when it finds that convenient. > > There is a very clean way of doing this with address spaces. It's > something I would like to see done properly for 2.5: eliminate all > knowledge of buffer_heads from the VM layer. It would be pretty > simple to remove page->buffers completely and replace it with a > page->private pointer, owned by whatever address_space controlled the > page. Instead of trying to unmap and flush buffers on the page > directly, these operations would become address_space operations. > > We could still provide the standard try_to_free_buffers() and > unmap_underlying_metadata() functions to operate on the per-page > buffer_head lists, and existing filesystems would only have to point > their address_space "private metadata" operations at the generic > functions. However, a filesystem which had additional ordering > constraints could then intercept the flush or writeback calls from the > VM and decide on its own how best to honour the VM pressure. Stephen, The ->flush() operation (which we've been discussing a bit) would be very useful now (mainly for XFS). At page_launder(), we can call ->flush() if the given page has it defined. Otherwise use try_to_free_buffers() as we do now for filesystems which dont care about the special flushing treatment. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
"Stephen C. Tweedie" wrote: > Hi, > > On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: > > On Thu, 14 Dec 2000, Linus Torvalds wrote: > > > > Just one: any fs that really cares about completion callback is very likely > > to be picky about the requests ordering. So sync_buffers() is very unlikely > > to be useful anyway. > > > > In that sense we really don't have anonymous buffers here. I seriously > > suspect that "unrealistic" assumption is not unrealistic at all. I'm > > not sufficiently familiar with XFS code to say for sure, but... > > Right. ext3 and reiserfs just want to submit their own IOs when it > comes to the journal. (At least in ext3, already-journaled buffers > can be written back by the VM freely.) It's a matter of telling the > fs when that should start. > > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > > That's it. If fs wants unusual completions for requests - let it have its > > own queueing mechanism and submit these requests when it finds that convenient. > > There is a very clean way of doing this with address spaces. It's > something I would like to see done properly for 2.5: eliminate all > knowledge of buffer_heads from the VM layer. It would be pretty > simple to remove page->buffers completely and replace it with a > page->private pointer, owned by whatever address_space controlled the > page. Instead of trying to unmap and flush buffers on the page > directly, these operations would become address_space operations. Yes this is a lot of what page buf would like to do eventually. Have the VM system pressure page_buf for pages which would then be able to intelligently call the file system to free up cached pages. A big part of getting Delay Alloc to not completely consume all the system pages, is being told when it's time to start really allocating disk space and push pages out. > > > We could still provide the standard try_to_free_buffers() and > unmap_underlying_metadata() functions to operate on the per-page > buffer_head lists, and existing filesystems would only have to point > their address_space "private metadata" operations at the generic > functions. However, a filesystem which had additional ordering > constraints could then intercept the flush or writeback calls from the > VM and decide on its own how best to honour the VM pressure. > > This even works both for hashed and unhashed anonymous buffers, *if* > you allow the filesystem to attach all of its hashed buffers buffers > to an address_space of its own rather than having the buffer cache > pool all such buffers together. > > --Stephen -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Chris Mason wrote: > On Fri, 15 Dec 2000, Alexander Viro wrote: > > > Just one: any fs that really cares about completion callback is very likely > > to be picky about the requests ordering. So sync_buffers() is very unlikely > > to be useful anyway. > > > Somewhat. I guess there are at least two ways to do it. First flush the > buffers where ordering matters (log blocks), then send the others onto the > dirty list (general metadata). You might have your own end_io for those, and > sync_buffers would lose it. > > Second way (reiserfs recently changed to this method) is to do all the > flushing yourself, and remove the need for an end_io call back. > I'm curious about this. Does the mean reiserFS is doing all of it's own buffer management? This would seem a little redundant with what is already in the kernel? > > > > In that sense we really don't have anonymous buffers here. I seriously > > suspect that "unrealistic" assumption is not unrealistic at all. I'm > > not sufficiently familiar with XFS code to say for sure, but... > > > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > > That's it. If fs wants unusual completions for requests - let it have its > > own queueing mechanism and submit these requests when it finds that convenient. > > > Yes, this is exactly what we've discussed. > > -chris -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Alexander Viro wrote: > On Thu, 14 Dec 2000, Linus Torvalds wrote: > > > Good point. > > > > This actually looks fairly nasty to fix. The obvious fix would be to not > > put such buffers on the dirty list at all, and instead rely on the VM > > layer calling "writepage()" when it wants to push out the pages. > > That would be the nice behaviour from a VM standpoint. > > > > However, that assumes that you don't have any "anonymous" buffers, which > > is probably an unrealistic assumption. > > > > The problem is that we don't have any per-buffer "writebuffer()" function, > > the way we have them per-page. It was never needed for any of the normal > > filesystems, and XFS just happened to be able to take advantage of the > > b_end_io behaviour. > > > > Suggestions welcome. > > Just one: any fs that really cares about completion callback is very likely > to be picky about the requests ordering. So sync_buffers() is very unlikely > to be useful anyway. Actually no, that's not the issue. The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering. Sync IO on each log buffer isn't realistic; the performance hit would be to great. I wasn't around when most of XFS was developed, but from I what I understand it was discovered early on that firing off writes in a particular order doesn't guarantee that they will finish in that order. Thus the implantation of a sequence number for each log write. One of the obstacles we ran into early on in the linux port was the fact that linux used fixed size IO requests to any given device. But most of XFS's meta data structures vary in size in multiples of 512 bytes. We were also implementing a page caching / clustering layer called page_buf which understands primarily pages and not necessary disk blocks. If your FS block size happens to match your page size then things are good, but it doesn't So we added a bit map field to the pages structure. Each bit then represents one BASIC BLOCK eg 512 for all practical purposes The end_io functions XFS defines updates the correct bit or the whole bit array if the whole page is valid, thus signaling the rest of the page_buf that the io has completed. Ok there is a lot more to it than what I've just described but you probably get the idea. > > > In that sense we really don't have anonymous buffers here. I seriously > suspect that "unrealistic" assumption is not unrealistic at all. I'm > not sufficiently familiar with XFS code to say for sure, but... > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > That's it. If fs wants unusual completions for requests - let it have its > own queueing mechanism and submit these requests when it finds that convenient. > > Stephen, you probably already thought about that area. Could you comment on > that? > Cheers, > Al -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Linus Torvalds wrote: > On Thu, 14 Dec 2000, Russell Cattelan wrote: > > > > Ok one more wrinkle. > > sync_buffers calls ll_rw_block, this is going to have the same problem as > > calling ll_rw_block directly. > > Good point. > > This actually looks fairly nasty to fix. The obvious fix would be to not > put such buffers on the dirty list at all, and instead rely on the VM > layer calling "writepage()" when it wants to push out the pages. > That would be the nice behaviour from a VM standpoint. > > However, that assumes that you don't have any "anonymous" buffers, which > is probably an unrealistic assumption. > > The problem is that we don't have any per-buffer "writebuffer()" function, > the way we have them per-page. It was never needed for any of the normal > filesystems, and XFS just happened to be able to take advantage of the > b_end_io behaviour. > > Suggestions welcome. > > Linus Ok after a bit of trial and error I do have something working. I wouldn't call it the most elegant solution but it does work and it isn't very intrusive. #define BH_End_io 7/* End io function defined don't remap it */ /* don't change the callback if somebody explicitly set it */ if(!test_bit(BH_End_io, >b_state)){ bh->b_end_io = end_buffer_io_sync; } What I've done is in the XFS set buffer_head setup functions is set the initial value of b_state to BH_Locked and BH_End_io set the callback function and the rest of the relevant fields and then unlock the buffer. The only other quick fix that comes to mind is to change sync_buffers to use submit_bh rather than ll_rw_block. -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Sat, 16 Dec 2000, Linus Torvalds wrote: > Your patch looks fine, although I'd personally prefer this one even more: > Yes, that looks better and works here. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Fri, 15 Dec 2000, Linus Torvalds wrote: [ writepage for anon buffers ] > > It might be 10 lines of change, and obviously correct. > I'll give this a try, it will be interesting regardless of if it is simple enough for kernel inclusion. On a related note, I hit a snag porting reiserfs into test12, where block_read_full_page assumes the buffer it gets back from get_block won't be up to date. When reiserfs reads a packed tail directly into the page, reading the newly mapped buffer isn't required, and is actually a bad idea, since the packed tails have a block number of 0 when copied into the page cache. In other words, after calling reiserfs_get_block, the buffer might be mapped and uptodate, with no i/o required in block_read_full_page The following patch to block_read_full_page fixes things for me, and seems like a good idea in general. It might be better to apply something similar to submit_bh instead...comments? -chris --- linux-test12/fs/buffer.cMon Dec 18 11:37:42 2000 +++ linux/fs/buffer.c Mon Dec 18 11:38:36 2000 @@ -1706,8 +1706,10 @@ } } - arr[nr] = bh; - nr++; + if (!buffer_uptodate(bh)) { + arr[nr] = bh; + nr++; + } } while (i++, iblock++, (bh = bh->b_this_page) != head); if (!nr) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Fri, 15 Dec 2000, Linus Torvalds wrote: [ writepage for anon buffers ] It might be 10 lines of change, and obviously correct. I'll give this a try, it will be interesting regardless of if it is simple enough for kernel inclusion. On a related note, I hit a snag porting reiserfs into test12, where block_read_full_page assumes the buffer it gets back from get_block won't be up to date. When reiserfs reads a packed tail directly into the page, reading the newly mapped buffer isn't required, and is actually a bad idea, since the packed tails have a block number of 0 when copied into the page cache. In other words, after calling reiserfs_get_block, the buffer might be mapped and uptodate, with no i/o required in block_read_full_page The following patch to block_read_full_page fixes things for me, and seems like a good idea in general. It might be better to apply something similar to submit_bh instead...comments? -chris --- linux-test12/fs/buffer.cMon Dec 18 11:37:42 2000 +++ linux/fs/buffer.c Mon Dec 18 11:38:36 2000 @@ -1706,8 +1706,10 @@ } } - arr[nr] = bh; - nr++; + if (!buffer_uptodate(bh)) { + arr[nr] = bh; + nr++; + } } while (i++, iblock++, (bh = bh-b_this_page) != head); if (!nr) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Sat, 16 Dec 2000, Linus Torvalds wrote: Your patch looks fine, although I'd personally prefer this one even more: Yes, that looks better and works here. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Linus Torvalds wrote: On Thu, 14 Dec 2000, Russell Cattelan wrote: Ok one more wrinkle. sync_buffers calls ll_rw_block, this is going to have the same problem as calling ll_rw_block directly. Good point. This actually looks fairly nasty to fix. The obvious fix would be to not put such buffers on the dirty list at all, and instead rely on the VM layer calling "writepage()" when it wants to push out the pages. That would be the nice behaviour from a VM standpoint. However, that assumes that you don't have any "anonymous" buffers, which is probably an unrealistic assumption. The problem is that we don't have any per-buffer "writebuffer()" function, the way we have them per-page. It was never needed for any of the normal filesystems, and XFS just happened to be able to take advantage of the b_end_io behaviour. Suggestions welcome. Linus Ok after a bit of trial and error I do have something working. I wouldn't call it the most elegant solution but it does work and it isn't very intrusive. #define BH_End_io 7/* End io function defined don't remap it */ /* don't change the callback if somebody explicitly set it */ if(!test_bit(BH_End_io, bh-b_state)){ bh-b_end_io = end_buffer_io_sync; } What I've done is in the XFS set buffer_head setup functions is set the initial value of b_state to BH_Locked and BH_End_io set the callback function and the rest of the relevant fields and then unlock the buffer. The only other quick fix that comes to mind is to change sync_buffers to use submit_bh rather than ll_rw_block. -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Alexander Viro wrote: On Thu, 14 Dec 2000, Linus Torvalds wrote: Good point. This actually looks fairly nasty to fix. The obvious fix would be to not put such buffers on the dirty list at all, and instead rely on the VM layer calling "writepage()" when it wants to push out the pages. That would be the nice behaviour from a VM standpoint. However, that assumes that you don't have any "anonymous" buffers, which is probably an unrealistic assumption. The problem is that we don't have any per-buffer "writebuffer()" function, the way we have them per-page. It was never needed for any of the normal filesystems, and XFS just happened to be able to take advantage of the b_end_io behaviour. Suggestions welcome. Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. Actually no, that's not the issue. The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering. Sync IO on each log buffer isn't realistic; the performance hit would be to great. I wasn't around when most of XFS was developed, but from I what I understand it was discovered early on that firing off writes in a particular order doesn't guarantee that they will finish in that order. Thus the implantation of a sequence number for each log write. One of the obstacles we ran into early on in the linux port was the fact that linux used fixed size IO requests to any given device. But most of XFS's meta data structures vary in size in multiples of 512 bytes. We were also implementing a page caching / clustering layer called page_buf which understands primarily pages and not necessary disk blocks. If your FS block size happens to match your page size then things are good, but it doesn't So we added a bit map field to the pages structure. Each bit then represents one BASIC BLOCK eg 512 for all practical purposes The end_io functions XFS defines updates the correct bit or the whole bit array if the whole page is valid, thus signaling the rest of the page_buf that the io has completed. Ok there is a lot more to it than what I've just described but you probably get the idea. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. Stephen, you probably already thought about that area. Could you comment on that? Cheers, Al -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Chris Mason wrote: On Fri, 15 Dec 2000, Alexander Viro wrote: Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. Somewhat. I guess there are at least two ways to do it. First flush the buffers where ordering matters (log blocks), then send the others onto the dirty list (general metadata). You might have your own end_io for those, and sync_buffers would lose it. Second way (reiserfs recently changed to this method) is to do all the flushing yourself, and remove the need for an end_io call back. I'm curious about this. Does the mean reiserFS is doing all of it's own buffer management? This would seem a little redundant with what is already in the kernel? In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. Yes, this is exactly what we've discussed. -chris -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
"Stephen C. Tweedie" wrote: Hi, On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: On Thu, 14 Dec 2000, Linus Torvalds wrote: Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... Right. ext3 and reiserfs just want to submit their own IOs when it comes to the journal. (At least in ext3, already-journaled buffers can be written back by the VM freely.) It's a matter of telling the fs when that should start. What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. There is a very clean way of doing this with address spaces. It's something I would like to see done properly for 2.5: eliminate all knowledge of buffer_heads from the VM layer. It would be pretty simple to remove page-buffers completely and replace it with a page-private pointer, owned by whatever address_space controlled the page. Instead of trying to unmap and flush buffers on the page directly, these operations would become address_space operations. Yes this is a lot of what page buf would like to do eventually. Have the VM system pressure page_buf for pages which would then be able to intelligently call the file system to free up cached pages. A big part of getting Delay Alloc to not completely consume all the system pages, is being told when it's time to start really allocating disk space and push pages out. We could still provide the standard try_to_free_buffers() and unmap_underlying_metadata() functions to operate on the per-page buffer_head lists, and existing filesystems would only have to point their address_space "private metadata" operations at the generic functions. However, a filesystem which had additional ordering constraints could then intercept the flush or writeback calls from the VM and decide on its own how best to honour the VM pressure. This even works both for hashed and unhashed anonymous buffers, *if* you allow the filesystem to attach all of its hashed buffers buffers to an address_space of its own rather than having the buffer cache pool all such buffers together. --Stephen -- Russell Cattelan [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Fri, 15 Dec 2000, Stephen C. Tweedie wrote: Hi, On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: On Thu, 14 Dec 2000, Linus Torvalds wrote: Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... Right. ext3 and reiserfs just want to submit their own IOs when it comes to the journal. (At least in ext3, already-journaled buffers can be written back by the VM freely.) It's a matter of telling the fs when that should start. What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. There is a very clean way of doing this with address spaces. It's something I would like to see done properly for 2.5: eliminate all knowledge of buffer_heads from the VM layer. It would be pretty simple to remove page-buffers completely and replace it with a page-private pointer, owned by whatever address_space controlled the page. Instead of trying to unmap and flush buffers on the page directly, these operations would become address_space operations. We could still provide the standard try_to_free_buffers() and unmap_underlying_metadata() functions to operate on the per-page buffer_head lists, and existing filesystems would only have to point their address_space "private metadata" operations at the generic functions. However, a filesystem which had additional ordering constraints could then intercept the flush or writeback calls from the VM and decide on its own how best to honour the VM pressure. Stephen, The -flush() operation (which we've been discussing a bit) would be very useful now (mainly for XFS). At page_launder(), we can call -flush() if the given page has it defined. Otherwise use try_to_free_buffers() as we do now for filesystems which dont care about the special flushing treatment. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Sat, 16 Dec 2000, Jeff Chua wrote: > > > Now, I also agree that we should be able to clean this up properly for > > 2.5.x, and actually do exactly this for the anonymous buffers, so that > > the VM no longer needs to worry about buffer knowledge, and fs/buffer.c > > becomes just another user of the writepage functionality. That is not > > all that hard to do, it mainly just requires some small changes to how > > Why not incorporate this change into 2.4.x? It might be 10 lines of change, and obviously correct. And it might not be. If somebody wants to try out the DirtyPage approach for buffer handling, please do so. I'll apply it if it _does_ turn out to be as small as I suspect it might be, and if the code is straightforward and obvious. If not, we're better off leaving it for 2.5.x Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
> Now, I also agree that we should be able to clean this up properly for > 2.5.x, and actually do exactly this for the anonymous buffers, so that > the VM no longer needs to worry about buffer knowledge, and fs/buffer.c > becomes just another user of the writepage functionality. That is not > all that hard to do, it mainly just requires some small changes to how Why not incorporate this change into 2.4.x? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
In article <[EMAIL PROTECTED]>, Stephen C. Tweedie <[EMAIL PROTECTED]> wrote: > >> What we really need is a way for VFS/VM to pass the pressure on filesystem. >> That's it. If fs wants unusual completions for requests - let it have its >> own queueing mechanism and submit these requests when it finds that convenient. > >There is a very clean way of doing this with address spaces. It's >something I would like to see done properly for 2.5: eliminate all >knowledge of buffer_heads from the VM layer. Note that you should be able to already get this effect with the current 2.4.0 tree. The way to get the VM to ignore your buffer heads is to never mark the buffers dirty, and instead mark the page they are on dirty along with giving it a mapping (you can have a special per-superblock "metadata-mapping" for stuff that isn't actually associated with any particular file. That way, the VM will just call writepage() for you, and you can use that to schedule your writeouts (you don't actually need to write the page the VM _asks_ you to write - you can just mark it dirty again and consider the writepage to be mainly a VM pressure indicator). Now, I also agree that we should be able to clean this up properly for 2.5.x, and actually do exactly this for the anonymous buffers, so that the VM no longer needs to worry about buffer knowledge, and fs/buffer.c becomes just another user of the writepage functionality. That is not all that hard to do, it mainly just requires some small changes to how "mark_buffer_dirty()" works (ie it would also mark the page dirty, so that the VM layer would know to call "writepage()"). I really think almost all of the VM infrastructure for this is in place, the PageDirty code has both simplified the VM enormously and made it a lot more powerful at the same time. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Hi, On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: > On Thu, 14 Dec 2000, Linus Torvalds wrote: > > Just one: any fs that really cares about completion callback is very likely > to be picky about the requests ordering. So sync_buffers() is very unlikely > to be useful anyway. > > In that sense we really don't have anonymous buffers here. I seriously > suspect that "unrealistic" assumption is not unrealistic at all. I'm > not sufficiently familiar with XFS code to say for sure, but... Right. ext3 and reiserfs just want to submit their own IOs when it comes to the journal. (At least in ext3, already-journaled buffers can be written back by the VM freely.) It's a matter of telling the fs when that should start. > What we really need is a way for VFS/VM to pass the pressure on filesystem. > That's it. If fs wants unusual completions for requests - let it have its > own queueing mechanism and submit these requests when it finds that convenient. There is a very clean way of doing this with address spaces. It's something I would like to see done properly for 2.5: eliminate all knowledge of buffer_heads from the VM layer. It would be pretty simple to remove page->buffers completely and replace it with a page->private pointer, owned by whatever address_space controlled the page. Instead of trying to unmap and flush buffers on the page directly, these operations would become address_space operations. We could still provide the standard try_to_free_buffers() and unmap_underlying_metadata() functions to operate on the per-page buffer_head lists, and existing filesystems would only have to point their address_space "private metadata" operations at the generic functions. However, a filesystem which had additional ordering constraints could then intercept the flush or writeback calls from the VM and decide on its own how best to honour the VM pressure. This even works both for hashed and unhashed anonymous buffers, *if* you allow the filesystem to attach all of its hashed buffers buffers to an address_space of its own rather than having the buffer cache pool all such buffers together. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Fri, 15 Dec 2000, Alexander Viro wrote: > Just one: any fs that really cares about completion callback is very likely > to be picky about the requests ordering. So sync_buffers() is very unlikely > to be useful anyway. > Somewhat. I guess there are at least two ways to do it. First flush the buffers where ordering matters (log blocks), then send the others onto the dirty list (general metadata). You might have your own end_io for those, and sync_buffers would lose it. Second way (reiserfs recently changed to this method) is to do all the flushing yourself, and remove the need for an end_io call back. > In that sense we really don't have anonymous buffers here. I seriously > suspect that "unrealistic" assumption is not unrealistic at all. I'm > not sufficiently familiar with XFS code to say for sure, but... > > What we really need is a way for VFS/VM to pass the pressure on filesystem. > That's it. If fs wants unusual completions for requests - let it have its > own queueing mechanism and submit these requests when it finds that convenient. > Yes, this is exactly what we've discussed. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Fri, 15 Dec 2000, Alexander Viro wrote: Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. Somewhat. I guess there are at least two ways to do it. First flush the buffers where ordering matters (log blocks), then send the others onto the dirty list (general metadata). You might have your own end_io for those, and sync_buffers would lose it. Second way (reiserfs recently changed to this method) is to do all the flushing yourself, and remove the need for an end_io call back. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. Yes, this is exactly what we've discussed. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Hi, On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote: On Thu, 14 Dec 2000, Linus Torvalds wrote: Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... Right. ext3 and reiserfs just want to submit their own IOs when it comes to the journal. (At least in ext3, already-journaled buffers can be written back by the VM freely.) It's a matter of telling the fs when that should start. What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. There is a very clean way of doing this with address spaces. It's something I would like to see done properly for 2.5: eliminate all knowledge of buffer_heads from the VM layer. It would be pretty simple to remove page-buffers completely and replace it with a page-private pointer, owned by whatever address_space controlled the page. Instead of trying to unmap and flush buffers on the page directly, these operations would become address_space operations. We could still provide the standard try_to_free_buffers() and unmap_underlying_metadata() functions to operate on the per-page buffer_head lists, and existing filesystems would only have to point their address_space "private metadata" operations at the generic functions. However, a filesystem which had additional ordering constraints could then intercept the flush or writeback calls from the VM and decide on its own how best to honour the VM pressure. This even works both for hashed and unhashed anonymous buffers, *if* you allow the filesystem to attach all of its hashed buffers buffers to an address_space of its own rather than having the buffer cache pool all such buffers together. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
In article [EMAIL PROTECTED], Stephen C. Tweedie [EMAIL PROTECTED] wrote: What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. There is a very clean way of doing this with address spaces. It's something I would like to see done properly for 2.5: eliminate all knowledge of buffer_heads from the VM layer. Note that you should be able to already get this effect with the current 2.4.0 tree. The way to get the VM to ignore your buffer heads is to never mark the buffers dirty, and instead mark the page they are on dirty along with giving it a mapping (you can have a special per-superblock "metadata-mapping" for stuff that isn't actually associated with any particular file. That way, the VM will just call writepage() for you, and you can use that to schedule your writeouts (you don't actually need to write the page the VM _asks_ you to write - you can just mark it dirty again and consider the writepage to be mainly a VM pressure indicator). Now, I also agree that we should be able to clean this up properly for 2.5.x, and actually do exactly this for the anonymous buffers, so that the VM no longer needs to worry about buffer knowledge, and fs/buffer.c becomes just another user of the writepage functionality. That is not all that hard to do, it mainly just requires some small changes to how "mark_buffer_dirty()" works (ie it would also mark the page dirty, so that the VM layer would know to call "writepage()"). I really think almost all of the VM infrastructure for this is in place, the PageDirty code has both simplified the VM enormously and made it a lot more powerful at the same time. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
Now, I also agree that we should be able to clean this up properly for 2.5.x, and actually do exactly this for the anonymous buffers, so that the VM no longer needs to worry about buffer knowledge, and fs/buffer.c becomes just another user of the writepage functionality. That is not all that hard to do, it mainly just requires some small changes to how Why not incorporate this change into 2.4.x? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Sat, 16 Dec 2000, Jeff Chua wrote: Now, I also agree that we should be able to clean this up properly for 2.5.x, and actually do exactly this for the anonymous buffers, so that the VM no longer needs to worry about buffer knowledge, and fs/buffer.c becomes just another user of the writepage functionality. That is not all that hard to do, it mainly just requires some small changes to how Why not incorporate this change into 2.4.x? It might be 10 lines of change, and obviously correct. And it might not be. If somebody wants to try out the DirtyPage approach for buffer handling, please do so. I'll apply it if it _does_ turn out to be as small as I suspect it might be, and if the code is straightforward and obvious. If not, we're better off leaving it for 2.5.x Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Thu, 14 Dec 2000, Linus Torvalds wrote: > Good point. > > This actually looks fairly nasty to fix. The obvious fix would be to not > put such buffers on the dirty list at all, and instead rely on the VM > layer calling "writepage()" when it wants to push out the pages. > That would be the nice behaviour from a VM standpoint. > > However, that assumes that you don't have any "anonymous" buffers, which > is probably an unrealistic assumption. > > The problem is that we don't have any per-buffer "writebuffer()" function, > the way we have them per-page. It was never needed for any of the normal > filesystems, and XFS just happened to be able to take advantage of the > b_end_io behaviour. > > Suggestions welcome. Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. Stephen, you probably already thought about that area. Could you comment on that? Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Thu, 14 Dec 2000, Russell Cattelan wrote: > > Ok one more wrinkle. > sync_buffers calls ll_rw_block, this is going to have the same problem as > calling ll_rw_block directly. Good point. This actually looks fairly nasty to fix. The obvious fix would be to not put such buffers on the dirty list at all, and instead rely on the VM layer calling "writepage()" when it wants to push out the pages. That would be the nice behaviour from a VM standpoint. However, that assumes that you don't have any "anonymous" buffers, which is probably an unrealistic assumption. The problem is that we don't have any per-buffer "writebuffer()" function, the way we have them per-page. It was never needed for any of the normal filesystems, and XFS just happened to be able to take advantage of the b_end_io behaviour. Suggestions welcome. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Thu, 14 Dec 2000, Russell Cattelan wrote: > > So one more observation in > filemap_sync_pte > > lock_page(page); > error = filemap_write_page(page, 1); > -> UnlockPage(page); > This unlock page was removed? is that correct? Yes. The "writepage" thing changed: "struct file" disappeared (as I'm sure you also noticed), and the page writer is supposed to unlock the page itself. Which it may do at any time, of course. There are some reasons to do it only after the IO has actually completed: this way the VM layer won't try to write it out _again_ before the first IO hasn't even finished yet, and the writing logic can possibly be simplified if you know that nobody else will be touching that page. But that is up to you: you can do the UnlockPage before even returning from your "->writepage()" function, if you choose to do so. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
In article <[EMAIL PROTECTED]>, Russell Cattelan <[EMAIL PROTECTED]> wrote: >This would seem to be an error on the part of ll_rw_block. >Setting b_end_io to a default handler without checking to see >a callback has already been defined defeats the purpose of having >a function op. No. It just means that if you have your own function op, you had better not call "ll_rw_block()". The problem is that as it was done before, people would set the function op without actually holding the buffer lock. Which meant that you could have two unrelated users setting the function op to two different things, and it would be only a matter of the purest luck which one happened to "win". If you want to set your own end-operation, you now need to lock the buffer _first_, then set "b_end_io" to your operation, and then do a "submit_bh()". You cannot use ll_rw_block(). Yes, this is different than before. Sorry about that. But yes, this way actually happens to work reliably. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
In article [EMAIL PROTECTED], Russell Cattelan [EMAIL PROTECTED] wrote: This would seem to be an error on the part of ll_rw_block. Setting b_end_io to a default handler without checking to see a callback has already been defined defeats the purpose of having a function op. No. It just means that if you have your own function op, you had better not call "ll_rw_block()". The problem is that as it was done before, people would set the function op without actually holding the buffer lock. Which meant that you could have two unrelated users setting the function op to two different things, and it would be only a matter of the purest luck which one happened to "win". If you want to set your own end-operation, you now need to lock the buffer _first_, then set "b_end_io" to your operation, and then do a "submit_bh()". You cannot use ll_rw_block(). Yes, this is different than before. Sorry about that. But yes, this way actually happens to work reliably. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Thu, 14 Dec 2000, Russell Cattelan wrote: So one more observation in filemap_sync_pte lock_page(page); error = filemap_write_page(page, 1); - UnlockPage(page); This unlock page was removed? is that correct? Yes. The "writepage" thing changed: "struct file" disappeared (as I'm sure you also noticed), and the page writer is supposed to unlock the page itself. Which it may do at any time, of course. There are some reasons to do it only after the IO has actually completed: this way the VM layer won't try to write it out _again_ before the first IO hasn't even finished yet, and the writing logic can possibly be simplified if you know that nobody else will be touching that page. But that is up to you: you can do the UnlockPage before even returning from your "-writepage()" function, if you choose to do so. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Thu, 14 Dec 2000, Russell Cattelan wrote: Ok one more wrinkle. sync_buffers calls ll_rw_block, this is going to have the same problem as calling ll_rw_block directly. Good point. This actually looks fairly nasty to fix. The obvious fix would be to not put such buffers on the dirty list at all, and instead rely on the VM layer calling "writepage()" when it wants to push out the pages. That would be the nice behaviour from a VM standpoint. However, that assumes that you don't have any "anonymous" buffers, which is probably an unrealistic assumption. The problem is that we don't have any per-buffer "writebuffer()" function, the way we have them per-page. It was never needed for any of the normal filesystems, and XFS just happened to be able to take advantage of the b_end_io behaviour. Suggestions welcome. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Test12 ll_rw_block error.
On Thu, 14 Dec 2000, Linus Torvalds wrote: Good point. This actually looks fairly nasty to fix. The obvious fix would be to not put such buffers on the dirty list at all, and instead rely on the VM layer calling "writepage()" when it wants to push out the pages. That would be the nice behaviour from a VM standpoint. However, that assumes that you don't have any "anonymous" buffers, which is probably an unrealistic assumption. The problem is that we don't have any per-buffer "writebuffer()" function, the way we have them per-page. It was never needed for any of the normal filesystems, and XFS just happened to be able to take advantage of the b_end_io behaviour. Suggestions welcome. Just one: any fs that really cares about completion callback is very likely to be picky about the requests ordering. So sync_buffers() is very unlikely to be useful anyway. In that sense we really don't have anonymous buffers here. I seriously suspect that "unrealistic" assumption is not unrealistic at all. I'm not sufficiently familiar with XFS code to say for sure, but... What we really need is a way for VFS/VM to pass the pressure on filesystem. That's it. If fs wants unusual completions for requests - let it have its own queueing mechanism and submit these requests when it finds that convenient. Stephen, you probably already thought about that area. Could you comment on that? Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/