subject:"Re\: Test12 ll_rw

Re: Test12 ll_rw_block error.

2000-12-19 Thread Marcelo Tosatti



On Tue, 19 Dec 2000, Daniel Phillips wrote:

> Marcelo Tosatti wrote:
> > 
> > On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:
> > 
> > > Hi,
> > >
> > > On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> > > > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> > > >
> > > > Stephen,
> > > >
> > > > The ->flush() operation (which we've been discussing a bit) would be very
> > > > useful now (mainly for XFS).
> > > >
> > > > At page_launder(), we can call ->flush() if the given page has it defined.
> > > > Otherwise use try_to_free_buffers() as we do now for filesystems which
> > > > dont care about the special flushing treatment.
> > >
> > > As of 2.4.0test12, page_launder() will already call the
> > > per-address-space writepage() operation for dirty pages.  Do you need
> > > something similar for clean pages too, or does Linus's new laundry
> > > code give you what you need now?
> > 
> > I think the semantics of the filesystem specific ->flush and ->writepage
> > are not the same.
> > 
> > Is ok for filesystem specific writepage() code to sync other "physically
> > contiguous" dirty pages with reference to the one requested by
> > writepage() ?
> > 
> > If so, it can do the same job as the ->flush() idea we've discussing.
> 
> Except that for ->writepage you don't have the option of *not* writing
> the specified page.

It does.

Both the swapper writepage() operation and shm_writepage() cannot be able
to write the page. 
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-19 Thread Christoph Hellwig


In article <[EMAIL PROTECTED]> you wrote:
>> I think the semantics of the filesystem specific ->flush and ->writepage
>> are not the same.
>> 
>> Is ok for filesystem specific writepage() code to sync other "physically
>> contiguous" dirty pages with reference to the one requested by
>> writepage() ?
>> 
>> If so, it can do the same job as the ->flush() idea we've discussing.
>
> Except that for ->writepage you don't have the option of *not* writing
> the specified page.

In current -test13pre you have.

Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-19 Thread Daniel Phillips


Marcelo Tosatti wrote:
> 
> On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:
> 
> > Hi,
> >
> > On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> > > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> > >
> > > Stephen,
> > >
> > > The ->flush() operation (which we've been discussing a bit) would be very
> > > useful now (mainly for XFS).
> > >
> > > At page_launder(), we can call ->flush() if the given page has it defined.
> > > Otherwise use try_to_free_buffers() as we do now for filesystems which
> > > dont care about the special flushing treatment.
> >
> > As of 2.4.0test12, page_launder() will already call the
> > per-address-space writepage() operation for dirty pages.  Do you need
> > something similar for clean pages too, or does Linus's new laundry
> > code give you what you need now?
> 
> I think the semantics of the filesystem specific ->flush and ->writepage
> are not the same.
> 
> Is ok for filesystem specific writepage() code to sync other "physically
> contiguous" dirty pages with reference to the one requested by
> writepage() ?
> 
> If so, it can do the same job as the ->flush() idea we've discussing.

Except that for ->writepage you don't have the option of *not* writing
the specified page.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-19 Thread Marcelo Tosatti



On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:

> Hi,
> 
> On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> > 
> > Stephen,
> > 
> > The ->flush() operation (which we've been discussing a bit) would be very
> > useful now (mainly for XFS).
> > 
> > At page_launder(), we can call ->flush() if the given page has it defined.
> > Otherwise use try_to_free_buffers() as we do now for filesystems which
> > dont care about the special flushing treatment. 
> 
> As of 2.4.0test12, page_launder() will already call the
> per-address-space writepage() operation for dirty pages.  Do you need
> something similar for clean pages too, or does Linus's new laundry
> code give you what you need now?

I think the semantics of the filesystem specific ->flush and ->writepage
are not the same.

Is ok for filesystem specific writepage() code to sync other "physically
contiguous" dirty pages with reference to the one requested by
writepage() ? 

If so, it can do the same job as the ->flush() idea we've discussing.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-19 Thread Marcelo Tosatti



On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:

 Hi,
 
 On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
  On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
  
  Stephen,
  
  The -flush() operation (which we've been discussing a bit) would be very
  useful now (mainly for XFS).
  
  At page_launder(), we can call -flush() if the given page has it defined.
  Otherwise use try_to_free_buffers() as we do now for filesystems which
  dont care about the special flushing treatment. 
 
 As of 2.4.0test12, page_launder() will already call the
 per-address-space writepage() operation for dirty pages.  Do you need
 something similar for clean pages too, or does Linus's new laundry
 code give you what you need now?

I think the semantics of the filesystem specific -flush and -writepage
are not the same.

Is ok for filesystem specific writepage() code to sync other "physically
contiguous" dirty pages with reference to the one requested by
writepage() ? 

If so, it can do the same job as the -flush() idea we've discussing.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-19 Thread Daniel Phillips


Marcelo Tosatti wrote:
 
 On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:
 
  Hi,
 
  On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
   On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
  
   Stephen,
  
   The -flush() operation (which we've been discussing a bit) would be very
   useful now (mainly for XFS).
  
   At page_launder(), we can call -flush() if the given page has it defined.
   Otherwise use try_to_free_buffers() as we do now for filesystems which
   dont care about the special flushing treatment.
 
  As of 2.4.0test12, page_launder() will already call the
  per-address-space writepage() operation for dirty pages.  Do you need
  something similar for clean pages too, or does Linus's new laundry
  code give you what you need now?
 
 I think the semantics of the filesystem specific -flush and -writepage
 are not the same.
 
 Is ok for filesystem specific writepage() code to sync other "physically
 contiguous" dirty pages with reference to the one requested by
 writepage() ?
 
 If so, it can do the same job as the -flush() idea we've discussing.

Except that for -writepage you don't have the option of *not* writing
the specified page.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-19 Thread Christoph Hellwig


In article [EMAIL PROTECTED] you wrote:
 I think the semantics of the filesystem specific -flush and -writepage
 are not the same.
 
 Is ok for filesystem specific writepage() code to sync other "physically
 contiguous" dirty pages with reference to the one requested by
 writepage() ?
 
 If so, it can do the same job as the -flush() idea we've discussing.

 Except that for -writepage you don't have the option of *not* writing
 the specified page.

In current -test13pre you have.

Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-19 Thread Marcelo Tosatti



On Tue, 19 Dec 2000, Daniel Phillips wrote:

 Marcelo Tosatti wrote:
  
  On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:
  
   Hi,
  
   On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
   
Stephen,
   
The -flush() operation (which we've been discussing a bit) would be very
useful now (mainly for XFS).
   
At page_launder(), we can call -flush() if the given page has it defined.
Otherwise use try_to_free_buffers() as we do now for filesystems which
dont care about the special flushing treatment.
  
   As of 2.4.0test12, page_launder() will already call the
   per-address-space writepage() operation for dirty pages.  Do you need
   something similar for clean pages too, or does Linus's new laundry
   code give you what you need now?
  
  I think the semantics of the filesystem specific -flush and -writepage
  are not the same.
  
  Is ok for filesystem specific writepage() code to sync other "physically
  contiguous" dirty pages with reference to the one requested by
  writepage() ?
  
  If so, it can do the same job as the -flush() idea we've discussing.
 
 Except that for -writepage you don't have the option of *not* writing
 the specified page.

It does.

Both the swapper writepage() operation and shm_writepage() cannot be able
to write the page. 
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-18 Thread Stephen C. Tweedie


Hi,

On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> 
> Stephen,
> 
> The ->flush() operation (which we've been discussing a bit) would be very
> useful now (mainly for XFS).
> 
> At page_launder(), we can call ->flush() if the given page has it defined.
> Otherwise use try_to_free_buffers() as we do now for filesystems which
> dont care about the special flushing treatment. 

As of 2.4.0test12, page_launder() will already call the
per-address-space writepage() operation for dirty pages.  Do you need
something similar for clean pages too, or does Linus's new laundry
code give you what you need now?

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-18 Thread Stephen C. Tweedie

Hi,

On Sat, Dec 16, 2000 at 07:08:02PM -0600, Russell Cattelan wrote:
> > There is a very clean way of doing this with address spaces.  It's
> > something I would like to see done properly for 2.5: eliminate all
> > knowledge of buffer_heads from the VM layer.  It would be pretty
> > simple to remove page->buffers completely and replace it with a
> > page->private pointer, owned by whatever address_space controlled the
> > page.  Instead of trying to unmap and flush buffers on the page
> > directly, these operations would become address_space operations.
> 
> Yes this is a lot of what page buf would like to do eventually.
> Have the VM system pressure page_buf for pages which would
> then be able to intelligently call the file system to free up cached pages.
> A big part of getting Delay Alloc to not completely consume all the
> system pages, is being told when it's time to start really allocating disk
> space and push pages out.

Delayed allocation is actually much easier, since it's entirely an
operation on logical page addresses, not physical ones --- by
definition you don't have any buffer_heads yet because you haven't
decided on the disk blocks.  If you're just dealing with pages, not
blocks, then the address_space is the natural way of dealing with it
already.  

Only the full semantics of the flush callback have been missing to
date, and with 2.4.0-test12 even that is mostly solved, since
page_launder will give you the writeback() callbacks you need to flush
things to disk when you start getting memory pressure.  You can even
treat the writepage() as an advisory call.  

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-18 Thread Stephen C. Tweedie


Hi,

On Sat, Dec 16, 2000 at 07:08:02PM -0600, Russell Cattelan wrote:
  There is a very clean way of doing this with address spaces.  It's
  something I would like to see done properly for 2.5: eliminate all
  knowledge of buffer_heads from the VM layer.  It would be pretty
  simple to remove page-buffers completely and replace it with a
  page-private pointer, owned by whatever address_space controlled the
  page.  Instead of trying to unmap and flush buffers on the page
  directly, these operations would become address_space operations.
 
 Yes this is a lot of what page buf would like to do eventually.
 Have the VM system pressure page_buf for pages which would
 then be able to intelligently call the file system to free up cached pages.
 A big part of getting Delay Alloc to not completely consume all the
 system pages, is being told when it's time to start really allocating disk
 space and push pages out.

Delayed allocation is actually much easier, since it's entirely an
operation on logical page addresses, not physical ones --- by
definition you don't have any buffer_heads yet because you haven't
decided on the disk blocks.  If you're just dealing with pages, not
blocks, then the address_space is the natural way of dealing with it
already.  

Only the full semantics of the flush callback have been missing to
date, and with 2.4.0-test12 even that is mostly solved, since
page_launder will give you the writeback() callbacks you need to flush
things to disk when you start getting memory pressure.  You can even
treat the writepage() as an advisory call.  

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-18 Thread Stephen C. Tweedie


Hi,

On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
 On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
 
 Stephen,
 
 The -flush() operation (which we've been discussing a bit) would be very
 useful now (mainly for XFS).
 
 At page_launder(), we can call -flush() if the given page has it defined.
 Otherwise use try_to_free_buffers() as we do now for filesystems which
 dont care about the special flushing treatment. 

As of 2.4.0test12, page_launder() will already call the
per-address-space writepage() operation for dirty pages.  Do you need
something similar for clean pages too, or does Linus's new laundry
code give you what you need now?

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-17 Thread Chris Mason




On Sat, 16 Dec 2000, Russell Cattelan wrote:
> >
> I'm curious about this.
> Does the mean reiserFS is doing all of it's own buffer management?
> 
> This would seem a little redundant with what is already in the kernel?
> 
For metadata only reiserfs does its own write management.  The buffers
come from getblk. We just don't mark the buffers dirty for flushing by
flush_dirty_buffers()

This has the advantage of avoiding races against bdflush and friends, and
makes it easier to keep track of which buffers have actually made their
way to disk.  It has all of the obvious disadvantages with respect to
memory pressure.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-17 Thread Chris Mason




On Sat, 16 Dec 2000, Russell Cattelan wrote:
 
 I'm curious about this.
 Does the mean reiserFS is doing all of it's own buffer management?
 
 This would seem a little redundant with what is already in the kernel?
 
For metadata only reiserfs does its own write management.  The buffers
come from getblk. We just don't mark the buffers dirty for flushing by
flush_dirty_buffers()

This has the advantage of avoiding races against bdflush and friends, and
makes it easier to keep track of which buffers have actually made their
way to disk.  It has all of the obvious disadvantages with respect to
memory pressure.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Marcelo Tosatti



On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:

> Hi,
> 
> On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
> > On Thu, 14 Dec 2000, Linus Torvalds wrote:
> > 
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> > 
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
> 
> Right.  ext3 and reiserfs just want to submit their own IOs when it
> comes to the journal.  (At least in ext3, already-journaled buffers
> can be written back by the VM freely.)  It's a matter of telling the
> fs when that should start.
> 
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
> 
> There is a very clean way of doing this with address spaces.  It's
> something I would like to see done properly for 2.5: eliminate all
> knowledge of buffer_heads from the VM layer.  It would be pretty
> simple to remove page->buffers completely and replace it with a
> page->private pointer, owned by whatever address_space controlled the
> page.  Instead of trying to unmap and flush buffers on the page
> directly, these operations would become address_space operations.
> 
> We could still provide the standard try_to_free_buffers() and
> unmap_underlying_metadata() functions to operate on the per-page
> buffer_head lists, and existing filesystems would only have to point
> their address_space "private metadata" operations at the generic
> functions.  However, a filesystem which had additional ordering
> constraints could then intercept the flush or writeback calls from the
> VM and decide on its own how best to honour the VM pressure.

Stephen,

The ->flush() operation (which we've been discussing a bit) would be very
useful now (mainly for XFS).

At page_launder(), we can call ->flush() if the given page has it defined.
Otherwise use try_to_free_buffers() as we do now for filesystems which
dont care about the special flushing treatment. 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


"Stephen C. Tweedie" wrote:

> Hi,
>
> On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
> > On Thu, 14 Dec 2000, Linus Torvalds wrote:
> >
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> >
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
>
> Right.  ext3 and reiserfs just want to submit their own IOs when it
> comes to the journal.  (At least in ext3, already-journaled buffers
> can be written back by the VM freely.)  It's a matter of telling the
> fs when that should start.
>
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
>
> There is a very clean way of doing this with address spaces.  It's
> something I would like to see done properly for 2.5: eliminate all
> knowledge of buffer_heads from the VM layer.  It would be pretty
> simple to remove page->buffers completely and replace it with a
> page->private pointer, owned by whatever address_space controlled the
> page.  Instead of trying to unmap and flush buffers on the page
> directly, these operations would become address_space operations.

Yes this is a lot of what page buf would like to do eventually.
Have the VM system pressure page_buf for pages which would
then be able to intelligently call the file system to free up cached pages.
A big part of getting Delay Alloc to not completely consume all the
system pages, is being told when it's time to start really allocating disk
space and push pages out.


>
>
> We could still provide the standard try_to_free_buffers() and
> unmap_underlying_metadata() functions to operate on the per-page
> buffer_head lists, and existing filesystems would only have to point
> their address_space "private metadata" operations at the generic
> functions.  However, a filesystem which had additional ordering
> constraints could then intercept the flush or writeback calls from the
> VM and decide on its own how best to honour the VM pressure.
>
> This even works both for hashed and unhashed anonymous buffers, *if*
> you allow the filesystem to attach all of its hashed buffers buffers
> to an address_space of its own rather than having the buffer cache
> pool all such buffers together.
>
> --Stephen

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Chris Mason wrote:

> On Fri, 15 Dec 2000, Alexander Viro wrote:
>
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> >
> Somewhat.  I guess there are at least two ways to do it.  First flush the
> buffers where ordering matters (log blocks), then send the others onto the
> dirty list (general metadata).  You might have your own end_io for those, and
> sync_buffers would lose it.
>
> Second way (reiserfs recently changed to this method) is to do all the
> flushing yourself, and remove the need for an end_io call back.
>
I'm curious about this.
Does the mean reiserFS is doing all of it's own buffer management?

This would seem a little redundant with what is already in the kernel?

>
>
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
> >
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
> >
> Yes, this is exactly what we've discussed.
>
> -chris

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan

Alexander Viro wrote:

> On Thu, 14 Dec 2000, Linus Torvalds wrote:
>
> > Good point.
> >
> > This actually looks fairly nasty to fix. The obvious fix would be to not
> > put such buffers on the dirty list at all, and instead rely on the VM
> > layer calling "writepage()" when it wants to push out the pages.
> > That would be the nice behaviour from a VM standpoint.
> >
> > However, that assumes that you don't have any "anonymous" buffers, which
> > is probably an unrealistic assumption.
> >
> > The problem is that we don't have any per-buffer "writebuffer()" function,
> > the way we have them per-page. It was never needed for any of the normal
> > filesystems, and XFS just happened to be able to take advantage of the
> > b_end_io behaviour.
> >
> > Suggestions welcome.
>
> Just one: any fs that really cares about completion callback is very likely
> to be picky about the requests ordering. So sync_buffers() is very unlikely
> to be useful anyway.

Actually no,  that's not the issue.

The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering.
Sync IO on each log buffer isn't realistic; the performance hit would be to great.

I wasn't around when  most of XFS was developed, but  from I what I understand it
was discovered early on that firing off writes in a particular order doesn't
guarantee that
they will finish in that order.  Thus the implantation of a sequence number for
each log write.

One of the obstacles we ran into early on in the linux port was the fact that
linux used fixed size IO requests to any given device.
But most of XFS's meta data structures vary in size in multiples of 512 bytes.

We were also implementing a page caching / clustering layer called
page_buf which understands  primarily  pages and not necessary
disk blocks. If your FS block size happens to match your page size then things
are good,  but it doesn't
So we added a  bit map field to the pages structure.
Each bit then represents one BASIC BLOCK eg 512 for all practical purposes

The end_io functions XFS defines updates the correct bit or the whole bit array
if the whole page is valid, thus signaling the rest of the page_buf that the io
has
completed.

Ok there is a lot more to it than what I've just described but you probably get
the idea.

>
>
> In that sense we really don't have anonymous buffers here. I seriously
> suspect that "unrealistic" assumption is not unrealistic at all. I'm
> not sufficiently familiar with XFS code to say for sure, but...
>
> What we really need is a way for VFS/VM to pass the pressure on filesystem.
> That's it. If fs wants unusual completions for requests - let it have its
> own queueing mechanism and submit these requests when it finds that convenient.
>
> Stephen, you probably already thought about that area. Could you comment on
> that?
> Cheers,
> Al

--
Russell Cattelan
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan

Linus Torvalds wrote:

> On Thu, 14 Dec 2000, Russell Cattelan wrote:
> >
> > Ok one more wrinkle.
> > sync_buffers calls ll_rw_block, this is going to have the same problem as
> > calling ll_rw_block directly.
>
> Good point.
>
> This actually looks fairly nasty to fix. The obvious fix would be to not
> put such buffers on the dirty list at all, and instead rely on the VM
> layer calling "writepage()" when it wants to push out the pages.
> That would be the nice behaviour from a VM standpoint.
>
> However, that assumes that you don't have any "anonymous" buffers, which
> is probably an unrealistic assumption.
>
> The problem is that we don't have any per-buffer "writebuffer()" function,
> the way we have them per-page. It was never needed for any of the normal
> filesystems, and XFS just happened to be able to take advantage of the
> b_end_io behaviour.
>
> Suggestions welcome.
>
> Linus

Ok after a bit of trial and error I do have something working.
I wouldn't call it the most elegant solution but it does work
and it isn't very intrusive.

#define BH_End_io  7/* End io function defined don't remap it */

/*  don't change the callback if somebody explicitly set it */

if(!test_bit(BH_End_io, >b_state)){
  bh->b_end_io = end_buffer_io_sync;
}
What I've done is in the XFS set buffer_head setup functions is
set the initial value of b_state to BH_Locked  and BH_End_io
set the callback function and the rest of the relevant fields and then unlock
the
buffer.

The only other quick fix that comes to mind is to change sync_buffers to use
submit_bh rather than ll_rw_block.

--
Russell Cattelan
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Chris Mason




On Sat, 16 Dec 2000, Linus Torvalds wrote:

> Your patch looks fine, although I'd personally prefer this one even more:
> 
Yes, that looks better and works here.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Chris Mason

On Fri, 15 Dec 2000, Linus Torvalds wrote:

[ writepage for anon buffers ]

> 
> It might be 10 lines of change, and obviously correct.
> 
I'll give this a try, it will be interesting regardless of if it is simple
enough for kernel inclusion.  

On a related note, I hit a snag porting reiserfs into test12, where
block_read_full_page assumes the buffer it gets back from get_block won't
be up to date.  When reiserfs reads a packed tail directly into the page,
reading the  newly mapped buffer isn't required, and is actually a bad
idea, since the packed tails have a block number of 0 when copied into
the page cache.

In other words, after calling reiserfs_get_block, the buffer might be
mapped and uptodate, with no i/o required in block_read_full_page

The following patch to block_read_full_page fixes things for me, and seems
like a good idea in general.  It might be better to apply something
similar to submit_bh instead...comments?

-chris

--- linux-test12/fs/buffer.cMon Dec 18 11:37:42 2000
+++ linux/fs/buffer.c   Mon Dec 18 11:38:36 2000
@@ -1706,8 +1706,10 @@
}
}

-   arr[nr] = bh;
-   nr++;
+   if (!buffer_uptodate(bh)) {
+   arr[nr] = bh;
+   nr++;
+   }
} while (i++, iblock++, (bh = bh->b_this_page) != head);

if (!nr) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Chris Mason


On Fri, 15 Dec 2000, Linus Torvalds wrote:

[ writepage for anon buffers ]

 
 It might be 10 lines of change, and obviously correct.
 
I'll give this a try, it will be interesting regardless of if it is simple
enough for kernel inclusion.  

On a related note, I hit a snag porting reiserfs into test12, where
block_read_full_page assumes the buffer it gets back from get_block won't
be up to date.  When reiserfs reads a packed tail directly into the page,
reading the  newly mapped buffer isn't required, and is actually a bad
idea, since the packed tails have a block number of 0 when copied into
the page cache.

In other words, after calling reiserfs_get_block, the buffer might be
mapped and uptodate, with no i/o required in block_read_full_page

The following patch to block_read_full_page fixes things for me, and seems
like a good idea in general.  It might be better to apply something
similar to submit_bh instead...comments?

-chris

--- linux-test12/fs/buffer.cMon Dec 18 11:37:42 2000
+++ linux/fs/buffer.c   Mon Dec 18 11:38:36 2000
@@ -1706,8 +1706,10 @@
}
}
 
-   arr[nr] = bh;
-   nr++;
+   if (!buffer_uptodate(bh)) {
+   arr[nr] = bh;
+   nr++;
+   }
} while (i++, iblock++, (bh = bh-b_this_page) != head);
 
if (!nr) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Chris Mason




On Sat, 16 Dec 2000, Linus Torvalds wrote:

 Your patch looks fine, although I'd personally prefer this one even more:
 
Yes, that looks better and works here.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Linus Torvalds wrote:

 On Thu, 14 Dec 2000, Russell Cattelan wrote:
 
  Ok one more wrinkle.
  sync_buffers calls ll_rw_block, this is going to have the same problem as
  calling ll_rw_block directly.

 Good point.

 This actually looks fairly nasty to fix. The obvious fix would be to not
 put such buffers on the dirty list at all, and instead rely on the VM
 layer calling "writepage()" when it wants to push out the pages.
 That would be the nice behaviour from a VM standpoint.

 However, that assumes that you don't have any "anonymous" buffers, which
 is probably an unrealistic assumption.

 The problem is that we don't have any per-buffer "writebuffer()" function,
 the way we have them per-page. It was never needed for any of the normal
 filesystems, and XFS just happened to be able to take advantage of the
 b_end_io behaviour.

 Suggestions welcome.

 Linus

Ok after a bit of trial and error I do have something working.
I wouldn't call it the most elegant solution but it does work
and it isn't very intrusive.

#define BH_End_io  7/* End io function defined don't remap it */

/*  don't change the callback if somebody explicitly set it */

if(!test_bit(BH_End_io, bh-b_state)){
  bh-b_end_io = end_buffer_io_sync;
}
What I've done is in the XFS set buffer_head setup functions is
set the initial value of b_state to BH_Locked  and BH_End_io
set the callback function and the rest of the relevant fields and then unlock
the
buffer.

The only other quick fix that comes to mind is to change sync_buffers to use
submit_bh rather than ll_rw_block.

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Alexander Viro wrote:

 On Thu, 14 Dec 2000, Linus Torvalds wrote:

  Good point.
 
  This actually looks fairly nasty to fix. The obvious fix would be to not
  put such buffers on the dirty list at all, and instead rely on the VM
  layer calling "writepage()" when it wants to push out the pages.
  That would be the nice behaviour from a VM standpoint.
 
  However, that assumes that you don't have any "anonymous" buffers, which
  is probably an unrealistic assumption.
 
  The problem is that we don't have any per-buffer "writebuffer()" function,
  the way we have them per-page. It was never needed for any of the normal
  filesystems, and XFS just happened to be able to take advantage of the
  b_end_io behaviour.
 
  Suggestions welcome.

 Just one: any fs that really cares about completion callback is very likely
 to be picky about the requests ordering. So sync_buffers() is very unlikely
 to be useful anyway.

Actually no,  that's not the issue.

The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering.
Sync IO on each log buffer isn't realistic; the performance hit would be to great.

I wasn't around when  most of XFS was developed, but  from I what I understand it
was discovered early on that firing off writes in a particular order doesn't
guarantee that
they will finish in that order.  Thus the implantation of a sequence number for
each log write.


One of the obstacles we ran into early on in the linux port was the fact that
linux used fixed size IO requests to any given device.
But most of XFS's meta data structures vary in size in multiples of 512 bytes.

We were also implementing a page caching / clustering layer called
page_buf which understands  primarily  pages and not necessary
disk blocks. If your FS block size happens to match your page size then things
are good,  but it doesn't
So we added a  bit map field to the pages structure.
Each bit then represents one BASIC BLOCK eg 512 for all practical purposes

The end_io functions XFS defines updates the correct bit or the whole bit array
if the whole page is valid, thus signaling the rest of the page_buf that the io
has
completed.

Ok there is a lot more to it than what I've just described but you probably get
the idea.




 In that sense we really don't have anonymous buffers here. I seriously
 suspect that "unrealistic" assumption is not unrealistic at all. I'm
 not sufficiently familiar with XFS code to say for sure, but...

 What we really need is a way for VFS/VM to pass the pressure on filesystem.
 That's it. If fs wants unusual completions for requests - let it have its
 own queueing mechanism and submit these requests when it finds that convenient.

 Stephen, you probably already thought about that area. Could you comment on
 that?
 Cheers,
 Al

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


Chris Mason wrote:

 On Fri, 15 Dec 2000, Alexander Viro wrote:

  Just one: any fs that really cares about completion callback is very likely
  to be picky about the requests ordering. So sync_buffers() is very unlikely
  to be useful anyway.
 
 Somewhat.  I guess there are at least two ways to do it.  First flush the
 buffers where ordering matters (log blocks), then send the others onto the
 dirty list (general metadata).  You might have your own end_io for those, and
 sync_buffers would lose it.

 Second way (reiserfs recently changed to this method) is to do all the
 flushing yourself, and remove the need for an end_io call back.

I'm curious about this.
Does the mean reiserFS is doing all of it's own buffer management?

This would seem a little redundant with what is already in the kernel?



  In that sense we really don't have anonymous buffers here. I seriously
  suspect that "unrealistic" assumption is not unrealistic at all. I'm
  not sufficiently familiar with XFS code to say for sure, but...
 
  What we really need is a way for VFS/VM to pass the pressure on filesystem.
  That's it. If fs wants unusual completions for requests - let it have its
  own queueing mechanism and submit these requests when it finds that convenient.
 
 Yes, this is exactly what we've discussed.

 -chris

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Russell Cattelan


"Stephen C. Tweedie" wrote:

 Hi,

 On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
  On Thu, 14 Dec 2000, Linus Torvalds wrote:
 
  Just one: any fs that really cares about completion callback is very likely
  to be picky about the requests ordering. So sync_buffers() is very unlikely
  to be useful anyway.
 
  In that sense we really don't have anonymous buffers here. I seriously
  suspect that "unrealistic" assumption is not unrealistic at all. I'm
  not sufficiently familiar with XFS code to say for sure, but...

 Right.  ext3 and reiserfs just want to submit their own IOs when it
 comes to the journal.  (At least in ext3, already-journaled buffers
 can be written back by the VM freely.)  It's a matter of telling the
 fs when that should start.

  What we really need is a way for VFS/VM to pass the pressure on filesystem.
  That's it. If fs wants unusual completions for requests - let it have its
  own queueing mechanism and submit these requests when it finds that convenient.

 There is a very clean way of doing this with address spaces.  It's
 something I would like to see done properly for 2.5: eliminate all
 knowledge of buffer_heads from the VM layer.  It would be pretty
 simple to remove page-buffers completely and replace it with a
 page-private pointer, owned by whatever address_space controlled the
 page.  Instead of trying to unmap and flush buffers on the page
 directly, these operations would become address_space operations.

Yes this is a lot of what page buf would like to do eventually.
Have the VM system pressure page_buf for pages which would
then be able to intelligently call the file system to free up cached pages.
A big part of getting Delay Alloc to not completely consume all the
system pages, is being told when it's time to start really allocating disk
space and push pages out.




 We could still provide the standard try_to_free_buffers() and
 unmap_underlying_metadata() functions to operate on the per-page
 buffer_head lists, and existing filesystems would only have to point
 their address_space "private metadata" operations at the generic
 functions.  However, a filesystem which had additional ordering
 constraints could then intercept the flush or writeback calls from the
 VM and decide on its own how best to honour the VM pressure.

 This even works both for hashed and unhashed anonymous buffers, *if*
 you allow the filesystem to attach all of its hashed buffers buffers
 to an address_space of its own rather than having the buffer cache
 pool all such buffers together.

 --Stephen

--
Russell Cattelan
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-16 Thread Marcelo Tosatti



On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:

 Hi,
 
 On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
  On Thu, 14 Dec 2000, Linus Torvalds wrote:
  
  Just one: any fs that really cares about completion callback is very likely
  to be picky about the requests ordering. So sync_buffers() is very unlikely
  to be useful anyway.
  
  In that sense we really don't have anonymous buffers here. I seriously
  suspect that "unrealistic" assumption is not unrealistic at all. I'm
  not sufficiently familiar with XFS code to say for sure, but...
 
 Right.  ext3 and reiserfs just want to submit their own IOs when it
 comes to the journal.  (At least in ext3, already-journaled buffers
 can be written back by the VM freely.)  It's a matter of telling the
 fs when that should start.
 
  What we really need is a way for VFS/VM to pass the pressure on filesystem.
  That's it. If fs wants unusual completions for requests - let it have its
  own queueing mechanism and submit these requests when it finds that convenient.
 
 There is a very clean way of doing this with address spaces.  It's
 something I would like to see done properly for 2.5: eliminate all
 knowledge of buffer_heads from the VM layer.  It would be pretty
 simple to remove page-buffers completely and replace it with a
 page-private pointer, owned by whatever address_space controlled the
 page.  Instead of trying to unmap and flush buffers on the page
 directly, these operations would become address_space operations.
 
 We could still provide the standard try_to_free_buffers() and
 unmap_underlying_metadata() functions to operate on the per-page
 buffer_head lists, and existing filesystems would only have to point
 their address_space "private metadata" operations at the generic
 functions.  However, a filesystem which had additional ordering
 constraints could then intercept the flush or writeback calls from the
 VM and decide on its own how best to honour the VM pressure.

Stephen,

The -flush() operation (which we've been discussing a bit) would be very
useful now (mainly for XFS).

At page_launder(), we can call -flush() if the given page has it defined.
Otherwise use try_to_free_buffers() as we do now for filesystems which
dont care about the special flushing treatment. 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Linus Torvalds




On Sat, 16 Dec 2000, Jeff Chua wrote:
>
> > Now, I also agree that we should be able to clean this up properly for
> > 2.5.x, and actually do exactly this for the anonymous buffers, so that
> > the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
> > becomes just another user of the writepage functionality.  That is not
> > all that hard to do, it mainly just requires some small changes to how
> 
> Why not incorporate this change into 2.4.x?

It might be 10 lines of change, and obviously correct.

And it might not be. If somebody wants to try out the DirtyPage approach
for buffer handling, please do so. I'll apply it if it _does_ turn out to
be as small as I suspect it might be, and if the code is straightforward
and obvious.

If not, we're better off leaving it for 2.5.x

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Jeff Chua


> Now, I also agree that we should be able to clean this up properly for
> 2.5.x, and actually do exactly this for the anonymous buffers, so that
> the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
> becomes just another user of the writepage functionality.  That is not
> all that hard to do, it mainly just requires some small changes to how

Why not incorporate this change into 2.4.x?

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Stephen C. Tweedie <[EMAIL PROTECTED]> wrote:
>
>> What we really need is a way for VFS/VM to pass the pressure on filesystem.
>> That's it. If fs wants unusual completions for requests - let it have its
>> own queueing mechanism and submit these requests when it finds that convenient.
>
>There is a very clean way of doing this with address spaces.  It's
>something I would like to see done properly for 2.5: eliminate all
>knowledge of buffer_heads from the VM layer.

Note that you should be able to already get this effect with the current
2.4.0 tree.

The way to get the VM to ignore your buffer heads is to never mark the
buffers dirty, and instead mark the page they are on dirty along with
giving it a mapping (you can have a special per-superblock
"metadata-mapping" for stuff that isn't actually associated with any
particular file.

That way, the VM will just call writepage() for you, and you can use
that to schedule your writeouts (you don't actually need to write the
page the VM _asks_ you to write - you can just mark it dirty again and
consider the writepage to be mainly a VM pressure indicator).

Now, I also agree that we should be able to clean this up properly for
2.5.x, and actually do exactly this for the anonymous buffers, so that
the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
becomes just another user of the writepage functionality.  That is not
all that hard to do, it mainly just requires some small changes to how
"mark_buffer_dirty()" works (ie it would also mark the page dirty, so
that the VM layer would know to call "writepage()"). 

I really think almost all of the VM infrastructure for this is in place,
the PageDirty code has both simplified the VM enormously and made it a
lot more powerful at the same time.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Stephen C. Tweedie

Hi,

On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
> On Thu, 14 Dec 2000, Linus Torvalds wrote:
> 
> Just one: any fs that really cares about completion callback is very likely
> to be picky about the requests ordering. So sync_buffers() is very unlikely
> to be useful anyway.
> 
> In that sense we really don't have anonymous buffers here. I seriously
> suspect that "unrealistic" assumption is not unrealistic at all. I'm
> not sufficiently familiar with XFS code to say for sure, but...

Right.  ext3 and reiserfs just want to submit their own IOs when it
comes to the journal.  (At least in ext3, already-journaled buffers
can be written back by the VM freely.)  It's a matter of telling the
fs when that should start.

> What we really need is a way for VFS/VM to pass the pressure on filesystem.
> That's it. If fs wants unusual completions for requests - let it have its
> own queueing mechanism and submit these requests when it finds that convenient.

There is a very clean way of doing this with address spaces.  It's
something I would like to see done properly for 2.5: eliminate all
knowledge of buffer_heads from the VM layer.  It would be pretty
simple to remove page->buffers completely and replace it with a
page->private pointer, owned by whatever address_space controlled the
page.  Instead of trying to unmap and flush buffers on the page
directly, these operations would become address_space operations.

We could still provide the standard try_to_free_buffers() and
unmap_underlying_metadata() functions to operate on the per-page
buffer_head lists, and existing filesystems would only have to point
their address_space "private metadata" operations at the generic
functions.  However, a filesystem which had additional ordering
constraints could then intercept the flush or writeback calls from the
VM and decide on its own how best to honour the VM pressure.

This even works both for hashed and unhashed anonymous buffers, *if*
you allow the filesystem to attach all of its hashed buffers buffers
to an address_space of its own rather than having the buffer cache
pool all such buffers together.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Chris Mason




On Fri, 15 Dec 2000, Alexander Viro wrote:

> Just one: any fs that really cares about completion callback is very likely
> to be picky about the requests ordering. So sync_buffers() is very unlikely
> to be useful anyway.
> 
Somewhat.  I guess there are at least two ways to do it.  First flush the
buffers where ordering matters (log blocks), then send the others onto the
dirty list (general metadata).  You might have your own end_io for those, and
sync_buffers would lose it.

Second way (reiserfs recently changed to this method) is to do all the
flushing yourself, and remove the need for an end_io call back.

> In that sense we really don't have anonymous buffers here. I seriously
> suspect that "unrealistic" assumption is not unrealistic at all. I'm
> not sufficiently familiar with XFS code to say for sure, but...
> 
> What we really need is a way for VFS/VM to pass the pressure on filesystem.
> That's it. If fs wants unusual completions for requests - let it have its
> own queueing mechanism and submit these requests when it finds that convenient.
> 
Yes, this is exactly what we've discussed.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Chris Mason




On Fri, 15 Dec 2000, Alexander Viro wrote:

 Just one: any fs that really cares about completion callback is very likely
 to be picky about the requests ordering. So sync_buffers() is very unlikely
 to be useful anyway.
 
Somewhat.  I guess there are at least two ways to do it.  First flush the
buffers where ordering matters (log blocks), then send the others onto the
dirty list (general metadata).  You might have your own end_io for those, and
sync_buffers would lose it.

Second way (reiserfs recently changed to this method) is to do all the
flushing yourself, and remove the need for an end_io call back.

 In that sense we really don't have anonymous buffers here. I seriously
 suspect that "unrealistic" assumption is not unrealistic at all. I'm
 not sufficiently familiar with XFS code to say for sure, but...
 
 What we really need is a way for VFS/VM to pass the pressure on filesystem.
 That's it. If fs wants unusual completions for requests - let it have its
 own queueing mechanism and submit these requests when it finds that convenient.
 
Yes, this is exactly what we've discussed.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Stephen C. Tweedie


Hi,

On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
 On Thu, 14 Dec 2000, Linus Torvalds wrote:
 
 Just one: any fs that really cares about completion callback is very likely
 to be picky about the requests ordering. So sync_buffers() is very unlikely
 to be useful anyway.
 
 In that sense we really don't have anonymous buffers here. I seriously
 suspect that "unrealistic" assumption is not unrealistic at all. I'm
 not sufficiently familiar with XFS code to say for sure, but...

Right.  ext3 and reiserfs just want to submit their own IOs when it
comes to the journal.  (At least in ext3, already-journaled buffers
can be written back by the VM freely.)  It's a matter of telling the
fs when that should start.

 What we really need is a way for VFS/VM to pass the pressure on filesystem.
 That's it. If fs wants unusual completions for requests - let it have its
 own queueing mechanism and submit these requests when it finds that convenient.

There is a very clean way of doing this with address spaces.  It's
something I would like to see done properly for 2.5: eliminate all
knowledge of buffer_heads from the VM layer.  It would be pretty
simple to remove page-buffers completely and replace it with a
page-private pointer, owned by whatever address_space controlled the
page.  Instead of trying to unmap and flush buffers on the page
directly, these operations would become address_space operations.

We could still provide the standard try_to_free_buffers() and
unmap_underlying_metadata() functions to operate on the per-page
buffer_head lists, and existing filesystems would only have to point
their address_space "private metadata" operations at the generic
functions.  However, a filesystem which had additional ordering
constraints could then intercept the flush or writeback calls from the
VM and decide on its own how best to honour the VM pressure.

This even works both for hashed and unhashed anonymous buffers, *if*
you allow the filesystem to attach all of its hashed buffers buffers
to an address_space of its own rather than having the buffer cache
pool all such buffers together.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Linus Torvalds


In article [EMAIL PROTECTED],
Stephen C. Tweedie [EMAIL PROTECTED] wrote:

 What we really need is a way for VFS/VM to pass the pressure on filesystem.
 That's it. If fs wants unusual completions for requests - let it have its
 own queueing mechanism and submit these requests when it finds that convenient.

There is a very clean way of doing this with address spaces.  It's
something I would like to see done properly for 2.5: eliminate all
knowledge of buffer_heads from the VM layer.

Note that you should be able to already get this effect with the current
2.4.0 tree.

The way to get the VM to ignore your buffer heads is to never mark the
buffers dirty, and instead mark the page they are on dirty along with
giving it a mapping (you can have a special per-superblock
"metadata-mapping" for stuff that isn't actually associated with any
particular file.

That way, the VM will just call writepage() for you, and you can use
that to schedule your writeouts (you don't actually need to write the
page the VM _asks_ you to write - you can just mark it dirty again and
consider the writepage to be mainly a VM pressure indicator).

Now, I also agree that we should be able to clean this up properly for
2.5.x, and actually do exactly this for the anonymous buffers, so that
the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
becomes just another user of the writepage functionality.  That is not
all that hard to do, it mainly just requires some small changes to how
"mark_buffer_dirty()" works (ie it would also mark the page dirty, so
that the VM layer would know to call "writepage()"). 

I really think almost all of the VM infrastructure for this is in place,
the PageDirty code has both simplified the VM enormously and made it a
lot more powerful at the same time.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Jeff Chua


 Now, I also agree that we should be able to clean this up properly for
 2.5.x, and actually do exactly this for the anonymous buffers, so that
 the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
 becomes just another user of the writepage functionality.  That is not
 all that hard to do, it mainly just requires some small changes to how

Why not incorporate this change into 2.4.x?

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-15 Thread Linus Torvalds




On Sat, 16 Dec 2000, Jeff Chua wrote:

  Now, I also agree that we should be able to clean this up properly for
  2.5.x, and actually do exactly this for the anonymous buffers, so that
  the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
  becomes just another user of the writepage functionality.  That is not
  all that hard to do, it mainly just requires some small changes to how
 
 Why not incorporate this change into 2.4.x?

It might be 10 lines of change, and obviously correct.

And it might not be. If somebody wants to try out the DirtyPage approach
for buffer handling, please do so. I'll apply it if it _does_ turn out to
be as small as I suspect it might be, and if the code is straightforward
and obvious.

If not, we're better off leaving it for 2.5.x

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Alexander Viro

On Thu, 14 Dec 2000, Linus Torvalds wrote:

> Good point. 
> 
> This actually looks fairly nasty to fix. The obvious fix would be to not
> put such buffers on the dirty list at all, and instead rely on the VM
> layer calling "writepage()" when it wants to push out the pages.
> That would be the nice behaviour from a VM standpoint.
> 
> However, that assumes that you don't have any "anonymous" buffers, which
> is probably an unrealistic assumption.
> 
> The problem is that we don't have any per-buffer "writebuffer()" function,
> the way we have them per-page. It was never needed for any of the normal
> filesystems, and XFS just happened to be able to take advantage of the
> b_end_io behaviour.
> 
> Suggestions welcome. 

Just one: any fs that really cares about completion callback is very likely
to be picky about the requests ordering. So sync_buffers() is very unlikely
to be useful anyway.

In that sense we really don't have anonymous buffers here. I seriously
suspect that "unrealistic" assumption is not unrealistic at all. I'm
not sufficiently familiar with XFS code to say for sure, but...

What we really need is a way for VFS/VM to pass the pressure on filesystem.
That's it. If fs wants unusual completions for requests - let it have its
own queueing mechanism and submit these requests when it finds that convenient.

Stephen, you probably already thought about that area. Could you comment on
that?
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Linus Torvalds

On Thu, 14 Dec 2000, Russell Cattelan wrote:
> 
> Ok one more wrinkle.
> sync_buffers calls ll_rw_block, this is going to have the same problem as
> calling ll_rw_block directly.

Good point. 

This actually looks fairly nasty to fix. The obvious fix would be to not
put such buffers on the dirty list at all, and instead rely on the VM
layer calling "writepage()" when it wants to push out the pages.
That would be the nice behaviour from a VM standpoint.

However, that assumes that you don't have any "anonymous" buffers, which
is probably an unrealistic assumption.

The problem is that we don't have any per-buffer "writebuffer()" function,
the way we have them per-page. It was never needed for any of the normal
filesystems, and XFS just happened to be able to take advantage of the
b_end_io behaviour.

Suggestions welcome. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Linus Torvalds

On Thu, 14 Dec 2000, Russell Cattelan wrote:
> 
> So one more observation in
> filemap_sync_pte
> 
>  lock_page(page);
>  error = filemap_write_page(page, 1);
> ->  UnlockPage(page);
> This unlock page was removed? is that correct?

Yes. The "writepage" thing changed: "struct file" disappeared (as I'm sure
you also noticed), and the page writer is supposed to unlock the page
itself. Which it may do at any time, of course.

There are some reasons to do it only after the IO has actually completed:
this way the VM layer won't try to write it out _again_ before the first
IO hasn't even finished yet, and the writing logic can possibly be
simplified if you know that nobody else will be touching that page.

But that is up to you: you can do the UnlockPage before even returning
from your "->writepage()" function, if you choose to do so.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Russell Cattelan  <[EMAIL PROTECTED]> wrote:
>This would seem to be an error on the part of ll_rw_block.
>Setting b_end_io to a default handler without checking to see
>a callback has already been defined defeats the purpose of having
>a function op.

No.

It just means that if you have your own function op, you had better not
call "ll_rw_block()".

The problem is that as it was done before, people would set the function
op without actually holding the buffer lock.

Which meant that you could have two unrelated users setting the function
op to two different things, and it would be only a matter of the purest
luck which one happened to "win".

If you want to set your own end-operation, you now need to lock the
buffer _first_, then set "b_end_io" to your operation, and then do a
"submit_bh()". You cannot use ll_rw_block().

Yes, this is different than before. Sorry about that.

But yes, this way actually happens to work reliably.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Linus Torvalds


In article [EMAIL PROTECTED],
Russell Cattelan  [EMAIL PROTECTED] wrote:
This would seem to be an error on the part of ll_rw_block.
Setting b_end_io to a default handler without checking to see
a callback has already been defined defeats the purpose of having
a function op.

No.

It just means that if you have your own function op, you had better not
call "ll_rw_block()".

The problem is that as it was done before, people would set the function
op without actually holding the buffer lock.

Which meant that you could have two unrelated users setting the function
op to two different things, and it would be only a matter of the purest
luck which one happened to "win".

If you want to set your own end-operation, you now need to lock the
buffer _first_, then set "b_end_io" to your operation, and then do a
"submit_bh()". You cannot use ll_rw_block().

Yes, this is different than before. Sorry about that.

But yes, this way actually happens to work reliably.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Linus Torvalds




On Thu, 14 Dec 2000, Russell Cattelan wrote:
 
 So one more observation in
 filemap_sync_pte
 
  lock_page(page);
  error = filemap_write_page(page, 1);
 -  UnlockPage(page);
 This unlock page was removed? is that correct?

Yes. The "writepage" thing changed: "struct file" disappeared (as I'm sure
you also noticed), and the page writer is supposed to unlock the page
itself. Which it may do at any time, of course.

There are some reasons to do it only after the IO has actually completed:
this way the VM layer won't try to write it out _again_ before the first
IO hasn't even finished yet, and the writing logic can possibly be
simplified if you know that nobody else will be touching that page.

But that is up to you: you can do the UnlockPage before even returning
from your "-writepage()" function, if you choose to do so.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Linus Torvalds




On Thu, 14 Dec 2000, Russell Cattelan wrote:
 
 Ok one more wrinkle.
 sync_buffers calls ll_rw_block, this is going to have the same problem as
 calling ll_rw_block directly.

Good point. 

This actually looks fairly nasty to fix. The obvious fix would be to not
put such buffers on the dirty list at all, and instead rely on the VM
layer calling "writepage()" when it wants to push out the pages.
That would be the nice behaviour from a VM standpoint.

However, that assumes that you don't have any "anonymous" buffers, which
is probably an unrealistic assumption.

The problem is that we don't have any per-buffer "writebuffer()" function,
the way we have them per-page. It was never needed for any of the normal
filesystems, and XFS just happened to be able to take advantage of the
b_end_io behaviour.

Suggestions welcome. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Test12 ll_rw_block error.

2000-12-14 Thread Alexander Viro




On Thu, 14 Dec 2000, Linus Torvalds wrote:

 Good point. 
 
 This actually looks fairly nasty to fix. The obvious fix would be to not
 put such buffers on the dirty list at all, and instead rely on the VM
 layer calling "writepage()" when it wants to push out the pages.
 That would be the nice behaviour from a VM standpoint.
 
 However, that assumes that you don't have any "anonymous" buffers, which
 is probably an unrealistic assumption.
 
 The problem is that we don't have any per-buffer "writebuffer()" function,
 the way we have them per-page. It was never needed for any of the normal
 filesystems, and XFS just happened to be able to take advantage of the
 b_end_io behaviour.
 
 Suggestions welcome. 

Just one: any fs that really cares about completion callback is very likely
to be picky about the requests ordering. So sync_buffers() is very unlikely
to be useful anyway.

In that sense we really don't have anonymous buffers here. I seriously
suspect that "unrealistic" assumption is not unrealistic at all. I'm
not sufficiently familiar with XFS code to say for sure, but...

What we really need is a way for VFS/VM to pass the pressure on filesystem.
That's it. If fs wants unusual completions for requests - let it have its
own queueing mechanism and submit these requests when it finds that convenient.

Stephen, you probably already thought about that area. Could you comment on
that?
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

46 matches

Mail list logo