Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-19 Thread Eric W. Biederman
Christian Borntraeger <[EMAIL PROTECTED]> writes:

> Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman:
>> Grr. Inconsistent rules on a core piece of infrastructure.
>> It looks like that if there is any trivial/minimal fix it
>> is based on your patch suppressing try_to_free_buffers.  Ugh.
>> 
>> Eric
>
> Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24
> and doing a nicer fix (rd rewrite for example for post 2.6.24)?

Looking at it.  If we don't get carried away using our own private
inode is barely more difficult then stomping on release_page and
in a number of ways a whole lot more subtle.  At least for 2.6.24 I
think it makes a sane fix, and quite possibly as a back port as well.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-19 Thread Eric W. Biederman
Christian Borntraeger [EMAIL PROTECTED] writes:

 Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman:
 Grr. Inconsistent rules on a core piece of infrastructure.
 It looks like that if there is any trivial/minimal fix it
 is based on your patch suppressing try_to_free_buffers.  Ugh.
 
 Eric

 Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24
 and doing a nicer fix (rd rewrite for example for post 2.6.24)?

Looking at it.  If we don't get carried away using our own private
inode is barely more difficult then stomping on release_page and
in a number of ways a whole lot more subtle.  At least for 2.6.24 I
think it makes a sane fix, and quite possibly as a back port as well.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-18 Thread Christian Borntraeger
Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman:
> Grr. Inconsistent rules on a core piece of infrastructure.
> It looks like that if there is any trivial/minimal fix it
> is based on your patch suppressing try_to_free_buffers.  Ugh.
> 
> Eric

Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24
and doing a nicer fix (rd rewrite for example for post 2.6.24)?

Christian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-18 Thread Christian Borntraeger
Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman:
 Grr. Inconsistent rules on a core piece of infrastructure.
 It looks like that if there is any trivial/minimal fix it
 is based on your patch suppressing try_to_free_buffers.  Ugh.
 
 Eric

Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24
and doing a nicer fix (rd rewrite for example for post 2.6.24)?

Christian
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason <[EMAIL PROTECTED]> writes:

> On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote:
>> Chris Mason <[EMAIL PROTECTED]> writes:
>> 
>> > So, the problem is using the Dirty bit to indicate pinned.  You're
>> > completely right that our current setup of buffer heads and pages and
>> > filesystpem metadata is complex and difficult.
>> >
>> > But, moving the buffer heads off of the page cache pages isn't going to
>> > make it any easier to use dirty as pinned, especially in the face of
>> > buffer_head users for file data pages.
>> 
>> Let me specific.  Not moving buffer_heads off of page cache pages,
>> moving buffer_heads off of the block devices page cache pages.
>> 
>> My problem is the coupling of how block devices are cached and the
>> implementation of buffer heads, and by removing that coupling
>> we can generally make things better.  Currently that coupling
>> means silly things like all block devices are cached in low memory.
>> Which probably isn't what you want if you actually have a use
>> for block devices.
>> 
>> For the ramdisk case in particular what this means is that there
>> are no more users that create buffer_head mappings on the block
>> device cache so using the dirty bit will be safe.
>
> Ok, we move the buffer heads off to a different inode, and that indoe
> has pages.  The pages on the inode still need to get pinned, how does
> that pinning happen?

> The problem you described where someone cleans a page because the buffer
> heads are clean happens already without help from userland.  So, keeping
> the pages away from userland won't save them from cleaning.
>
> Sorry if I'm reading your suggestion wrong...

Yes.  I was suggesting to continue to pin the pages for the page
cache pages block device inode, and having the buffer cache live
on some other inode.  Thus not causing me problems by adding clean
buffer_heads to my dirty pages.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote:
> Chris Mason <[EMAIL PROTECTED]> writes:
> 
> > So, the problem is using the Dirty bit to indicate pinned.  You're
> > completely right that our current setup of buffer heads and pages and
> > filesystpem metadata is complex and difficult.
> >
> > But, moving the buffer heads off of the page cache pages isn't going to
> > make it any easier to use dirty as pinned, especially in the face of
> > buffer_head users for file data pages.
> 
> Let me specific.  Not moving buffer_heads off of page cache pages,
> moving buffer_heads off of the block devices page cache pages.
> 
> My problem is the coupling of how block devices are cached and the
> implementation of buffer heads, and by removing that coupling
> we can generally make things better.  Currently that coupling
> means silly things like all block devices are cached in low memory.
> Which probably isn't what you want if you actually have a use
> for block devices.
> 
> For the ramdisk case in particular what this means is that there
> are no more users that create buffer_head mappings on the block
> device cache so using the dirty bit will be safe.

Ok, we move the buffer heads off to a different inode, and that indoe
has pages.  The pages on the inode still need to get pinned, how does
that pinning happen?

The problem you described where someone cleans a page because the buffer
heads are clean happens already without help from userland.  So, keeping
the pages away from userland won't save them from cleaning.

Sorry if I'm reading your suggestion wrong...

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason <[EMAIL PROTECTED]> writes:

> So, the problem is using the Dirty bit to indicate pinned.  You're
> completely right that our current setup of buffer heads and pages and
> filesystpem metadata is complex and difficult.
>
> But, moving the buffer heads off of the page cache pages isn't going to
> make it any easier to use dirty as pinned, especially in the face of
> buffer_head users for file data pages.

Let me specific.  Not moving buffer_heads off of page cache pages,
moving buffer_heads off of the block devices page cache pages.

My problem is the coupling of how block devices are cached and the
implementation of buffer heads, and by removing that coupling
we can generally make things better.  Currently that coupling
means silly things like all block devices are cached in low memory.
Which probably isn't what you want if you actually have a use
for block devices.

For the ramdisk case in particular what this means is that there
are no more users that create buffer_head mappings on the block
device cache so using the dirty bit will be safe. 

Further it removes the nasty possibility of user space messing with
metadata buffer head state.  So the only way those cases can happen is
a code bug, or a hardware bug.

So I think by removing these unnecessary code paths things will
become easier to work with.

> You've already seen Nick fsblock code, but you can see my general
> approach to replacing buffer heads here:
>
> http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h
>
> (alpha quality implementation in extent_map.c and users in inode.c)  The
> basic idea is to do extent based record keeping for mapping and state of
> things in the filesystem, and to avoid attaching these things to the
> page.

Interesting.  Something to dig into.

> Don't get me wrong, I'd love to see a simple and coherent fix for what
> reiserfs and ext3 do with buffer head state, but I think for the short
> term you're best off pinning the ramdisk pages via some other means.

Yes.  And the problem is hard enough to trigger that a short term fix
is actually of debatable value.  The reason this hasn't shown up more
frequently is that it only ever triggers if you are in the buffer head
reclaim state, which on a 64bit box means you have to use < 4K buffers
and have your ram cache another block device.  That plus most people
use initramfs these days.

For the short term we have Christian's other patch which simply
disables calling try_to_free_buffers.  Although that really feels
like a hack to me.

For 2.6.25 I think I have a shot at fixing these things cleanly.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 15:30 -0600, Eric W. Biederman wrote:
> Chris Mason <[EMAIL PROTECTED]> writes:
> 
> >> Thinking about it.  I don't believe anyone has ever intentionally built
> >> a filesystem tool that depends on being able to modify a file systems
> >> metadata buffer heads while the filesystem is running, and doing that
> >> would seem to be fragile as it would require a lot of cooperation
> >> between the tool and the filesystem about how the filesystem uses and
> >> implement things.
> >> 
> >
> > That's right.  For example, ext2 is doing directories in the page cache
> > of the directory inode, so there's a cache alias between the block
> > device page cache and the directory inode page cache.
> >
> >> Now I guess I need to see how difficult a patch would be to give
> >> filesystems magic inodes to keep their metadata buffer heads in.
> >
> > Not hard, the block device inode is already a magic inode for metadata
> > buffer heads.  You could just make another one attached to the bdev.
> >
> > But, I don't think I fully understand the problem you're trying to
> > solve?
> 
> 
> So the start:
> When we write buffers from the buffer cache we clear buffer_dirty
> but not PageDirty
>
> So try_to_free_buffers() will mark any page with clean buffer_heads
> that is not clean itself clean.
> 
> The ramdisk set pages dirty to keep them from being removed from the
> page cache, just like ramfs.
> 
So, the problem is using the Dirty bit to indicate pinned.  You're
completely right that our current setup of buffer heads and pages and
filesystem metadata is complex and difficult.

But, moving the buffer heads off of the page cache pages isn't going to
make it any easier to use dirty as pinned, especially in the face of
buffer_head users for file data pages.

You've already seen Nick fsblock code, but you can see my general
approach to replacing buffer heads here:

http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h

(alpha quality implementation in extent_map.c and users in inode.c)  The
basic idea is to do extent based record keeping for mapping and state of
things in the filesystem, and to avoid attaching these things to the
page.

> Unfortunately when those dirty ramdisk pages get buffers on them and
> those buffers all go clean and we are trying to reclaim buffer_heads
> we drop those pages from the page cache.   Ouch!
> 
> We can fix the ramdisk by setting making certain that buffer_heads
> on ramdisk pages stay dirty as well.  The problem is this breaks
> filesystems like reiserfs and ext3 that expect to be able to make 
> buffer_heads clean sometimes.
> 
> There are other ways to solve this for ramdisks, such as changing
> where ramdisks are stored.  However fixing the ramdisks this way
> still leaves the general problem that there are other paths to the
> filesystem metadata buffers, and those other paths cause the code
> to be complicated and buggy.
> 
> So I'm trying to see if we can untangle this Gordian knot, so the
> code because more easily maintainable.  
> 

Don't get me wrong, I'd love to see a simple and coherent fix for what
reiserfs and ext3 do with buffer head state, but I think for the short
term you're best off pinning the ramdisk pages via some other means.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Christian Borntraeger <[EMAIL PROTECTED]> writes:

> Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman:
>> Did you have both of my changes applied?
>> To init_page_buffer() and to the ramdisk_set_dirty_page?
>
> Yes, I removed my patch and applied both patches from you. 

Thanks.

Grr. Inconsistent rules on a core piece of infrastructure.
It looks like that if there is any trivial/minimal fix it
is based on your patch suppressing try_to_free_buffers.  Ugh.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Christian Borntraeger
Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman:
> Did you have both of my changes applied?
> To init_page_buffer() and to the ramdisk_set_dirty_page?

Yes, I removed my patch and applied both patches from you. 

Christian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason <[EMAIL PROTECTED]> writes:

>> Thinking about it.  I don't believe anyone has ever intentionally built
>> a filesystem tool that depends on being able to modify a file systems
>> metadata buffer heads while the filesystem is running, and doing that
>> would seem to be fragile as it would require a lot of cooperation
>> between the tool and the filesystem about how the filesystem uses and
>> implement things.
>> 
>
> That's right.  For example, ext2 is doing directories in the page cache
> of the directory inode, so there's a cache alias between the block
> device page cache and the directory inode page cache.
>
>> Now I guess I need to see how difficult a patch would be to give
>> filesystems magic inodes to keep their metadata buffer heads in.
>
> Not hard, the block device inode is already a magic inode for metadata
> buffer heads.  You could just make another one attached to the bdev.
>
> But, I don't think I fully understand the problem you're trying to
> solve?


So the start:
When we write buffers from the buffer cache we clear buffer_dirty
but not PageDirty

So try_to_free_buffers() will mark any page with clean buffer_heads
that is not clean itself clean.

The ramdisk set pages dirty to keep them from being removed from the
page cache, just like ramfs.

Unfortunately when those dirty ramdisk pages get buffers on them and
those buffers all go clean and we are trying to reclaim buffer_heads
we drop those pages from the page cache.   Ouch!

We can fix the ramdisk by setting making certain that buffer_heads
on ramdisk pages stay dirty as well.  The problem is this breaks
filesystems like reiserfs and ext3 that expect to be able to make 
buffer_heads clean sometimes.

There are other ways to solve this for ramdisks, such as changing
where ramdisks are stored.  However fixing the ramdisks this way
still leaves the general problem that there are other paths to the
filesystem metadata buffers, and those other paths cause the code
to be complicated and buggy.

So I'm trying to see if we can untangle this Gordian knot, so the
code because more easily maintainable.  

To make the buffer cache a helper library instead of require
infrastructure, it looks like two things need to happen.
- Move metadata buffer heads off block devices page cache entries, 
- Communicate the mappings of data pages to block device sectors
  in a generic way without buffer heads.

How we ultimately fix the ramdisk tends to depend on how we untangle
the buffer head problem.  Right now the only simple solution is to
suppress try_to_free_buffers, which is a bit ugly.  We can also come
up with a completely separate store for the pages in the buffer cache
but if we wind up moving the metadata buffer heads anyway then that
should not be necessary.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 14:29 -0600, Eric W. Biederman wrote:
> Chris Mason <[EMAIL PROTECTED]> writes:
> 
> > In this case, the commit block isn't allowed to be dirty before reiserfs
> > decides it is safe to write it.  The journal code expects it is the only
> > spot in the kernel setting buffer heads dirty, and it only does so after
> > the rest of the log blocks are safely on disk.
> 
> Ok.  So the journal code here fundamentally depends on being able to
> control the order of the writes, and something else being able to set
> the buffer head dirty messes up that control.
> 

Right.

> >> At the same time I increasingly don't think we should allow user space
> >> to dirty or update our filesystem metadata buffer heads.  That seems
> >> like asking for trouble.
> >> 
> >
> > Demanding trouble ;)
> 
> Looks like it.  There are even comments in jbd about the same class
> of problems.  Apparently dump and tune2fs on mounted filesystems have
> triggered some of these issues.  The practical question is any of this
> trouble worth handling.
> 
> Thinking about it.  I don't believe anyone has ever intentionally built
> a filesystem tool that depends on being able to modify a file systems
> metadata buffer heads while the filesystem is running, and doing that
> would seem to be fragile as it would require a lot of cooperation
> between the tool and the filesystem about how the filesystem uses and
> implement things.
> 

That's right.  For example, ext2 is doing directories in the page cache
of the directory inode, so there's a cache alias between the block
device page cache and the directory inode page cache.

> Now I guess I need to see how difficult a patch would be to give
> filesystems magic inodes to keep their metadata buffer heads in.

Not hard, the block device inode is already a magic inode for metadata
buffer heads.  You could just make another one attached to the bdev.

But, I don't think I fully understand the problem you're trying to
solve?

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason <[EMAIL PROTECTED]> writes:

> In this case, the commit block isn't allowed to be dirty before reiserfs
> decides it is safe to write it.  The journal code expects it is the only
> spot in the kernel setting buffer heads dirty, and it only does so after
> the rest of the log blocks are safely on disk.

Ok.  So the journal code here fundamentally depends on being able to
control the order of the writes, and something else being able to set
the buffer head dirty messes up that control.

> Given this is a ramdisk, the check can be ignored, but I'd rather not
> sprinkle if (ram_disk) into the FS code

It makes no sense to implement a ramdisk in a way that requires this.


>> At the same time I increasingly don't think we should allow user space
>> to dirty or update our filesystem metadata buffer heads.  That seems
>> like asking for trouble.
>> 
>
> Demanding trouble ;)

Looks like it.  There are even comments in jbd about the same class
of problems.  Apparently dump and tune2fs on mounted filesystems have
triggered some of these issues.  The practical question is any of this
trouble worth handling.

Thinking about it.  I don't believe anyone has ever intentionally built
a filesystem tool that depends on being able to modify a file systems
metadata buffer heads while the filesystem is running, and doing that
would seem to be fragile as it would require a lot of cooperation
between the tool and the filesystem about how the filesystem uses and
implement things.

Now I guess I need to see how difficult a patch would be to give
filesystems magic inodes to keep their metadata buffer heads in.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 11:57 -0600, Eric W. Biederman wrote:
> Christian Borntraeger <[EMAIL PROTECTED]> writes:
> 
> > Eric,
> >
> > Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger:
> >> Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:
> >> 
> >> > fs/buffer.c |3 +++
> >> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >> >  drivers/block/rd.c |   13 +
> >> >  1 files changed, 1 insertions(+), 12 deletions(-)
> >> 
> >> Your patches look sane so far. I have applied both patches, and the 
> >> problem 
> >> seems gone. I will try to get these patches to our testers.
> >> 
> >> As long as they dont find new problems:
> >
> > Our testers did only a short test, and then they were stopped by problems 
> > with
> > reiserfs. At the moment I cannot say for sure if your patch caused this, 
> > but 
> > we got the following BUG
> 
> Thanks.
> 
> > ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr
> > storage.
> > [ cut here ]
> > kernel BUG at
> > /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117!
> > illegal operation: 0001 [#1]
> > Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur
> > CPU:3Not tainted
> > Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88)
> > Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs])
> >R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0
> > Krnl GPRS: 0002 7411b5c8 002b 
> >7b04d000 0001  76d1de00
> >7513eec0 0003 0012 77f77680
> >7411b608 fb343b7e fb34404a 7513ee50
> > Krnl Code: fb344374: a7210002   tmll%r2,2
> >fb344378: a7840004   brc 8,fb344380
> >fb34437c: a7f40001   brc 15,fb34437e
> >   >fb344380: 5810d8c2   l   %r1,2242(%r13)
> >fb344384: 5820b03c   l   %r2,60(%r11)
> >fb344388: 0de1   basr%r14,%r1
> >fb34438a: 5810d90e   l   %r1,2318(%r13)
> >fb34438e: 5820b03c   l   %r2,60(%r11)
> >
> >
> > Looking at the code, this really seems related to dirty buffers, so your 
> > patch
> > is the main suspect at the moment. 
> 
> Sounds reasonable.
> 
> > if (!barrier) {
> > /* If there was a write error in the journal - we can't 
> > commit
> >  * this transaction - it will be invalid and, if successful,
> >  * will just end up propagating the write error out to
> >  * the file system. */
> > if (likely(!retval && !reiserfs_is_journal_aborted 
> > (journal))) {
> > if (buffer_dirty(jl->j_commit_bh))
> > 1117>BUG();
> > mark_buffer_dirty(jl->j_commit_bh) ;
> > sync_dirty_buffer(jl->j_commit_bh) ;
> > }
> > }
> 
> Grr.  I'm not certain how to call that.
> 
> Given that I should also be able to trigger this case by writing to
> the block device through the buffer cache (to the write spot at the
> write moment) this feels like a reiserfs bug. 
> Although I feel
> screaming about filesystems that go BUG instead of remount read-only
> 

In this case, the commit block isn't allowed to be dirty before reiserfs
decides it is safe to write it.  The journal code expects it is the only
spot in the kernel setting buffer heads dirty, and it only does so after
the rest of the log blocks are safely on disk.

Given this is a ramdisk, the check can be ignored, but I'd rather not
sprinkle if (ram_disk) into the FS code

> At the same time I increasingly don't think we should allow user space
> to dirty or update our filesystem metadata buffer heads.  That seems
> like asking for trouble.
> 

Demanding trouble ;)

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Christian Borntraeger <[EMAIL PROTECTED]> writes:

> Eric,
>
> Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger:
>> Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:
>> 
>> > fs/buffer.c |    3 +++
>> > 1 files changed, 3 insertions(+), 0 deletions(-)
>> >  drivers/block/rd.c |   13 +
>> >  1 files changed, 1 insertions(+), 12 deletions(-)
>> 
>> Your patches look sane so far. I have applied both patches, and the problem 
>> seems gone. I will try to get these patches to our testers.
>> 
>> As long as they dont find new problems:
>
> Our testers did only a short test, and then they were stopped by problems with
> reiserfs. At the moment I cannot say for sure if your patch caused this, but 
> we got the following BUG

Thanks.

> ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr
> storage.
> [ cut here ]
> kernel BUG at
> /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117!
> illegal operation: 0001 [#1]
> Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur
> CPU:3Not tainted
> Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88)
> Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs])
>R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0
> Krnl GPRS: 0002 7411b5c8 002b 
>7b04d000 0001  76d1de00
>7513eec0 0003 0012 77f77680
>7411b608 fb343b7e fb34404a 7513ee50
> Krnl Code: fb344374: a7210002   tmll%r2,2
>fb344378: a7840004   brc 8,fb344380
>fb34437c: a7f40001   brc 15,fb34437e
>   >fb344380: 5810d8c2   l   %r1,2242(%r13)
>fb344384: 5820b03c   l   %r2,60(%r11)
>fb344388: 0de1   basr%r14,%r1
>fb34438a: 5810d90e   l   %r1,2318(%r13)
>fb34438e: 5820b03c   l   %r2,60(%r11)
>
>
> Looking at the code, this really seems related to dirty buffers, so your patch
> is the main suspect at the moment. 

Sounds reasonable.

> if (!barrier) {
> /* If there was a write error in the journal - we can't commit
>  * this transaction - it will be invalid and, if successful,
>  * will just end up propagating the write error out to
>  * the file system. */
> if (likely(!retval && !reiserfs_is_journal_aborted 
> (journal))) {
> if (buffer_dirty(jl->j_commit_bh))
> 1117>BUG();
> mark_buffer_dirty(jl->j_commit_bh) ;
> sync_dirty_buffer(jl->j_commit_bh) ;
> }
> }

Grr.  I'm not certain how to call that.

Given that I should also be able to trigger this case by writing to
the block device through the buffer cache (to the write spot at the
write moment) this feels like a reiserfs bug.  Although I feel
screaming about filesystems that go BUG instead of remount read-only

At the same time I increasingly don't think we should allow user space
to dirty or update our filesystem metadata buffer heads.  That seems
like asking for trouble.

Did you have both of my changes applied?
To init_page_buffer() and to the ramdisk_set_dirty_page?

I'm guessing it was the change to ramdisk_set_dirty_page

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Christian Borntraeger
Eric,

Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger:
> Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:
> 
> > fs/buffer.c |    3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >  drivers/block/rd.c |   13 +
> >  1 files changed, 1 insertions(+), 12 deletions(-)
> 
> Your patches look sane so far. I have applied both patches, and the problem 
> seems gone. I will try to get these patches to our testers.
> 
> As long as they dont find new problems:

Our testers did only a short test, and then they were stopped by problems with
reiserfs. At the moment I cannot say for sure if your patch caused this, but 
we got the following BUG

ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr 
storage.
[ cut here ]
kernel BUG at 
/home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117!
illegal operation: 0001 [#1]
Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur
CPU:3Not tainted
Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88)
Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0
Krnl GPRS: 0002 7411b5c8 002b 
   7b04d000 0001  76d1de00
   7513eec0 0003 0012 77f77680
   7411b608 fb343b7e fb34404a 7513ee50
Krnl Code: fb344374: a7210002   tmll%r2,2
   fb344378: a7840004   brc 8,fb344380
   fb34437c: a7f40001   brc 15,fb34437e
  >fb344380: 5810d8c2   l   %r1,2242(%r13)
   fb344384: 5820b03c   l   %r2,60(%r11)
   fb344388: 0de1   basr%r14,%r1
   fb34438a: 5810d90e   l   %r1,2318(%r13)
   fb34438e: 5820b03c   l   %r2,60(%r11)


Looking at the code, this really seems related to dirty buffers, so your patch
is the main suspect at the moment. 

if (!barrier) {
/* If there was a write error in the journal - we can't commit
 * this transaction - it will be invalid and, if successful,
 * will just end up propagating the write error out to
 * the file system. */
if (likely(!retval && !reiserfs_is_journal_aborted (journal))) {
if (buffer_dirty(jl->j_commit_bh))
1117>BUG();
mark_buffer_dirty(jl->j_commit_bh) ;
sync_dirty_buffer(jl->j_commit_bh) ;
}
}

Christian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Christian Borntraeger [EMAIL PROTECTED] writes:

 Eric,

 Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger:
 Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:
 
  fs/buffer.c |    3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
   drivers/block/rd.c |   13 +
   1 files changed, 1 insertions(+), 12 deletions(-)
 
 Your patches look sane so far. I have applied both patches, and the problem 
 seems gone. I will try to get these patches to our testers.
 
 As long as they dont find new problems:

 Our testers did only a short test, and then they were stopped by problems with
 reiserfs. At the moment I cannot say for sure if your patch caused this, but 
 we got the following BUG

Thanks.

 ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr
 storage.
 [ cut here ]
 kernel BUG at
 /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117!
 illegal operation: 0001 [#1]
 Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur
 CPU:3Not tainted
 Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88)
 Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0
 Krnl GPRS: 0002 7411b5c8 002b 
7b04d000 0001  76d1de00
7513eec0 0003 0012 77f77680
7411b608 fb343b7e fb34404a 7513ee50
 Krnl Code: fb344374: a7210002   tmll%r2,2
fb344378: a7840004   brc 8,fb344380
fb34437c: a7f40001   brc 15,fb34437e
   fb344380: 5810d8c2   l   %r1,2242(%r13)
fb344384: 5820b03c   l   %r2,60(%r11)
fb344388: 0de1   basr%r14,%r1
fb34438a: 5810d90e   l   %r1,2318(%r13)
fb34438e: 5820b03c   l   %r2,60(%r11)


 Looking at the code, this really seems related to dirty buffers, so your patch
 is the main suspect at the moment. 

Sounds reasonable.

 if (!barrier) {
 /* If there was a write error in the journal - we can't commit
  * this transaction - it will be invalid and, if successful,
  * will just end up propagating the write error out to
  * the file system. */
 if (likely(!retval  !reiserfs_is_journal_aborted 
 (journal))) {
 if (buffer_dirty(jl-j_commit_bh))
 1117BUG();
 mark_buffer_dirty(jl-j_commit_bh) ;
 sync_dirty_buffer(jl-j_commit_bh) ;
 }
 }

Grr.  I'm not certain how to call that.

Given that I should also be able to trigger this case by writing to
the block device through the buffer cache (to the write spot at the
write moment) this feels like a reiserfs bug.  Although I feel
screaming about filesystems that go BUG instead of remount read-only

At the same time I increasingly don't think we should allow user space
to dirty or update our filesystem metadata buffer heads.  That seems
like asking for trouble.

Did you have both of my changes applied?
To init_page_buffer() and to the ramdisk_set_dirty_page?

I'm guessing it was the change to ramdisk_set_dirty_page

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 11:57 -0600, Eric W. Biederman wrote:
 Christian Borntraeger [EMAIL PROTECTED] writes:
 
  Eric,
 
  Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger:
  Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:
  
   fs/buffer.c |3 +++
   1 files changed, 3 insertions(+), 0 deletions(-)
drivers/block/rd.c |   13 +
1 files changed, 1 insertions(+), 12 deletions(-)
  
  Your patches look sane so far. I have applied both patches, and the 
  problem 
  seems gone. I will try to get these patches to our testers.
  
  As long as they dont find new problems:
 
  Our testers did only a short test, and then they were stopped by problems 
  with
  reiserfs. At the moment I cannot say for sure if your patch caused this, 
  but 
  we got the following BUG
 
 Thanks.
 
  ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr
  storage.
  [ cut here ]
  kernel BUG at
  /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117!
  illegal operation: 0001 [#1]
  Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur
  CPU:3Not tainted
  Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88)
  Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs])
 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0
  Krnl GPRS: 0002 7411b5c8 002b 
 7b04d000 0001  76d1de00
 7513eec0 0003 0012 77f77680
 7411b608 fb343b7e fb34404a 7513ee50
  Krnl Code: fb344374: a7210002   tmll%r2,2
 fb344378: a7840004   brc 8,fb344380
 fb34437c: a7f40001   brc 15,fb34437e
fb344380: 5810d8c2   l   %r1,2242(%r13)
 fb344384: 5820b03c   l   %r2,60(%r11)
 fb344388: 0de1   basr%r14,%r1
 fb34438a: 5810d90e   l   %r1,2318(%r13)
 fb34438e: 5820b03c   l   %r2,60(%r11)
 
 
  Looking at the code, this really seems related to dirty buffers, so your 
  patch
  is the main suspect at the moment. 
 
 Sounds reasonable.
 
  if (!barrier) {
  /* If there was a write error in the journal - we can't 
  commit
   * this transaction - it will be invalid and, if successful,
   * will just end up propagating the write error out to
   * the file system. */
  if (likely(!retval  !reiserfs_is_journal_aborted 
  (journal))) {
  if (buffer_dirty(jl-j_commit_bh))
  1117BUG();
  mark_buffer_dirty(jl-j_commit_bh) ;
  sync_dirty_buffer(jl-j_commit_bh) ;
  }
  }
 
 Grr.  I'm not certain how to call that.
 
 Given that I should also be able to trigger this case by writing to
 the block device through the buffer cache (to the write spot at the
 write moment) this feels like a reiserfs bug. 
 Although I feel
 screaming about filesystems that go BUG instead of remount read-only
 

In this case, the commit block isn't allowed to be dirty before reiserfs
decides it is safe to write it.  The journal code expects it is the only
spot in the kernel setting buffer heads dirty, and it only does so after
the rest of the log blocks are safely on disk.

Given this is a ramdisk, the check can be ignored, but I'd rather not
sprinkle if (ram_disk) into the FS code

 At the same time I increasingly don't think we should allow user space
 to dirty or update our filesystem metadata buffer heads.  That seems
 like asking for trouble.
 

Demanding trouble ;)

-chris

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason [EMAIL PROTECTED] writes:

 In this case, the commit block isn't allowed to be dirty before reiserfs
 decides it is safe to write it.  The journal code expects it is the only
 spot in the kernel setting buffer heads dirty, and it only does so after
 the rest of the log blocks are safely on disk.

Ok.  So the journal code here fundamentally depends on being able to
control the order of the writes, and something else being able to set
the buffer head dirty messes up that control.

 Given this is a ramdisk, the check can be ignored, but I'd rather not
 sprinkle if (ram_disk) into the FS code

It makes no sense to implement a ramdisk in a way that requires this.


 At the same time I increasingly don't think we should allow user space
 to dirty or update our filesystem metadata buffer heads.  That seems
 like asking for trouble.
 

 Demanding trouble ;)

Looks like it.  There are even comments in jbd about the same class
of problems.  Apparently dump and tune2fs on mounted filesystems have
triggered some of these issues.  The practical question is any of this
trouble worth handling.

Thinking about it.  I don't believe anyone has ever intentionally built
a filesystem tool that depends on being able to modify a file systems
metadata buffer heads while the filesystem is running, and doing that
would seem to be fragile as it would require a lot of cooperation
between the tool and the filesystem about how the filesystem uses and
implement things.

Now I guess I need to see how difficult a patch would be to give
filesystems magic inodes to keep their metadata buffer heads in.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 14:29 -0600, Eric W. Biederman wrote:
 Chris Mason [EMAIL PROTECTED] writes:
 
  In this case, the commit block isn't allowed to be dirty before reiserfs
  decides it is safe to write it.  The journal code expects it is the only
  spot in the kernel setting buffer heads dirty, and it only does so after
  the rest of the log blocks are safely on disk.
 
 Ok.  So the journal code here fundamentally depends on being able to
 control the order of the writes, and something else being able to set
 the buffer head dirty messes up that control.
 

Right.

  At the same time I increasingly don't think we should allow user space
  to dirty or update our filesystem metadata buffer heads.  That seems
  like asking for trouble.
  
 
  Demanding trouble ;)
 
 Looks like it.  There are even comments in jbd about the same class
 of problems.  Apparently dump and tune2fs on mounted filesystems have
 triggered some of these issues.  The practical question is any of this
 trouble worth handling.
 
 Thinking about it.  I don't believe anyone has ever intentionally built
 a filesystem tool that depends on being able to modify a file systems
 metadata buffer heads while the filesystem is running, and doing that
 would seem to be fragile as it would require a lot of cooperation
 between the tool and the filesystem about how the filesystem uses and
 implement things.
 

That's right.  For example, ext2 is doing directories in the page cache
of the directory inode, so there's a cache alias between the block
device page cache and the directory inode page cache.

 Now I guess I need to see how difficult a patch would be to give
 filesystems magic inodes to keep their metadata buffer heads in.

Not hard, the block device inode is already a magic inode for metadata
buffer heads.  You could just make another one attached to the bdev.

But, I don't think I fully understand the problem you're trying to
solve?

-chris


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason [EMAIL PROTECTED] writes:

 Thinking about it.  I don't believe anyone has ever intentionally built
 a filesystem tool that depends on being able to modify a file systems
 metadata buffer heads while the filesystem is running, and doing that
 would seem to be fragile as it would require a lot of cooperation
 between the tool and the filesystem about how the filesystem uses and
 implement things.
 

 That's right.  For example, ext2 is doing directories in the page cache
 of the directory inode, so there's a cache alias between the block
 device page cache and the directory inode page cache.

 Now I guess I need to see how difficult a patch would be to give
 filesystems magic inodes to keep their metadata buffer heads in.

 Not hard, the block device inode is already a magic inode for metadata
 buffer heads.  You could just make another one attached to the bdev.

 But, I don't think I fully understand the problem you're trying to
 solve?


So the start:
When we write buffers from the buffer cache we clear buffer_dirty
but not PageDirty

So try_to_free_buffers() will mark any page with clean buffer_heads
that is not clean itself clean.

The ramdisk set pages dirty to keep them from being removed from the
page cache, just like ramfs.

Unfortunately when those dirty ramdisk pages get buffers on them and
those buffers all go clean and we are trying to reclaim buffer_heads
we drop those pages from the page cache.   Ouch!

We can fix the ramdisk by setting making certain that buffer_heads
on ramdisk pages stay dirty as well.  The problem is this breaks
filesystems like reiserfs and ext3 that expect to be able to make 
buffer_heads clean sometimes.

There are other ways to solve this for ramdisks, such as changing
where ramdisks are stored.  However fixing the ramdisks this way
still leaves the general problem that there are other paths to the
filesystem metadata buffers, and those other paths cause the code
to be complicated and buggy.

So I'm trying to see if we can untangle this Gordian knot, so the
code because more easily maintainable.  

To make the buffer cache a helper library instead of require
infrastructure, it looks like two things need to happen.
- Move metadata buffer heads off block devices page cache entries, 
- Communicate the mappings of data pages to block device sectors
  in a generic way without buffer heads.

How we ultimately fix the ramdisk tends to depend on how we untangle
the buffer head problem.  Right now the only simple solution is to
suppress try_to_free_buffers, which is a bit ugly.  We can also come
up with a completely separate store for the pages in the buffer cache
but if we wind up moving the metadata buffer heads anyway then that
should not be necessary.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Christian Borntraeger
Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman:
 Did you have both of my changes applied?
 To init_page_buffer() and to the ramdisk_set_dirty_page?

Yes, I removed my patch and applied both patches from you. 

Christian
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Christian Borntraeger [EMAIL PROTECTED] writes:

 Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman:
 Did you have both of my changes applied?
 To init_page_buffer() and to the ramdisk_set_dirty_page?

 Yes, I removed my patch and applied both patches from you. 

Thanks.

Grr. Inconsistent rules on a core piece of infrastructure.
It looks like that if there is any trivial/minimal fix it
is based on your patch suppressing try_to_free_buffers.  Ugh.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 15:30 -0600, Eric W. Biederman wrote:
 Chris Mason [EMAIL PROTECTED] writes:
 
  Thinking about it.  I don't believe anyone has ever intentionally built
  a filesystem tool that depends on being able to modify a file systems
  metadata buffer heads while the filesystem is running, and doing that
  would seem to be fragile as it would require a lot of cooperation
  between the tool and the filesystem about how the filesystem uses and
  implement things.
  
 
  That's right.  For example, ext2 is doing directories in the page cache
  of the directory inode, so there's a cache alias between the block
  device page cache and the directory inode page cache.
 
  Now I guess I need to see how difficult a patch would be to give
  filesystems magic inodes to keep their metadata buffer heads in.
 
  Not hard, the block device inode is already a magic inode for metadata
  buffer heads.  You could just make another one attached to the bdev.
 
  But, I don't think I fully understand the problem you're trying to
  solve?
 
 
 So the start:
 When we write buffers from the buffer cache we clear buffer_dirty
 but not PageDirty

 So try_to_free_buffers() will mark any page with clean buffer_heads
 that is not clean itself clean.
 
 The ramdisk set pages dirty to keep them from being removed from the
 page cache, just like ramfs.
 
So, the problem is using the Dirty bit to indicate pinned.  You're
completely right that our current setup of buffer heads and pages and
filesystem metadata is complex and difficult.

But, moving the buffer heads off of the page cache pages isn't going to
make it any easier to use dirty as pinned, especially in the face of
buffer_head users for file data pages.

You've already seen Nick fsblock code, but you can see my general
approach to replacing buffer heads here:

http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h

(alpha quality implementation in extent_map.c and users in inode.c)  The
basic idea is to do extent based record keeping for mapping and state of
things in the filesystem, and to avoid attaching these things to the
page.

 Unfortunately when those dirty ramdisk pages get buffers on them and
 those buffers all go clean and we are trying to reclaim buffer_heads
 we drop those pages from the page cache.   Ouch!
 
 We can fix the ramdisk by setting making certain that buffer_heads
 on ramdisk pages stay dirty as well.  The problem is this breaks
 filesystems like reiserfs and ext3 that expect to be able to make 
 buffer_heads clean sometimes.
 
 There are other ways to solve this for ramdisks, such as changing
 where ramdisks are stored.  However fixing the ramdisks this way
 still leaves the general problem that there are other paths to the
 filesystem metadata buffers, and those other paths cause the code
 to be complicated and buggy.
 
 So I'm trying to see if we can untangle this Gordian knot, so the
 code because more easily maintainable.  
 

Don't get me wrong, I'd love to see a simple and coherent fix for what
reiserfs and ext3 do with buffer head state, but I think for the short
term you're best off pinning the ramdisk pages via some other means.

-chris


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason [EMAIL PROTECTED] writes:

 So, the problem is using the Dirty bit to indicate pinned.  You're
 completely right that our current setup of buffer heads and pages and
 filesystpem metadata is complex and difficult.

 But, moving the buffer heads off of the page cache pages isn't going to
 make it any easier to use dirty as pinned, especially in the face of
 buffer_head users for file data pages.

Let me specific.  Not moving buffer_heads off of page cache pages,
moving buffer_heads off of the block devices page cache pages.

My problem is the coupling of how block devices are cached and the
implementation of buffer heads, and by removing that coupling
we can generally make things better.  Currently that coupling
means silly things like all block devices are cached in low memory.
Which probably isn't what you want if you actually have a use
for block devices.

For the ramdisk case in particular what this means is that there
are no more users that create buffer_head mappings on the block
device cache so using the dirty bit will be safe. 

Further it removes the nasty possibility of user space messing with
metadata buffer head state.  So the only way those cases can happen is
a code bug, or a hardware bug.

So I think by removing these unnecessary code paths things will
become easier to work with.

 You've already seen Nick fsblock code, but you can see my general
 approach to replacing buffer heads here:

 http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h

 (alpha quality implementation in extent_map.c and users in inode.c)  The
 basic idea is to do extent based record keeping for mapping and state of
 things in the filesystem, and to avoid attaching these things to the
 page.

Interesting.  Something to dig into.

 Don't get me wrong, I'd love to see a simple and coherent fix for what
 reiserfs and ext3 do with buffer head state, but I think for the short
 term you're best off pinning the ramdisk pages via some other means.

Yes.  And the problem is hard enough to trigger that a short term fix
is actually of debatable value.  The reason this hasn't shown up more
frequently is that it only ever triggers if you are in the buffer head
reclaim state, which on a 64bit box means you have to use  4K buffers
and have your ram cache another block device.  That plus most people
use initramfs these days.

For the short term we have Christian's other patch which simply
disables calling try_to_free_buffers.  Although that really feels
like a hack to me.

For 2.6.25 I think I have a shot at fixing these things cleanly.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Chris Mason
On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote:
 Chris Mason [EMAIL PROTECTED] writes:
 
  So, the problem is using the Dirty bit to indicate pinned.  You're
  completely right that our current setup of buffer heads and pages and
  filesystpem metadata is complex and difficult.
 
  But, moving the buffer heads off of the page cache pages isn't going to
  make it any easier to use dirty as pinned, especially in the face of
  buffer_head users for file data pages.
 
 Let me specific.  Not moving buffer_heads off of page cache pages,
 moving buffer_heads off of the block devices page cache pages.
 
 My problem is the coupling of how block devices are cached and the
 implementation of buffer heads, and by removing that coupling
 we can generally make things better.  Currently that coupling
 means silly things like all block devices are cached in low memory.
 Which probably isn't what you want if you actually have a use
 for block devices.
 
 For the ramdisk case in particular what this means is that there
 are no more users that create buffer_head mappings on the block
 device cache so using the dirty bit will be safe.

Ok, we move the buffer heads off to a different inode, and that indoe
has pages.  The pages on the inode still need to get pinned, how does
that pinning happen?

The problem you described where someone cleans a page because the buffer
heads are clean happens already without help from userland.  So, keeping
the pages away from userland won't save them from cleaning.

Sorry if I'm reading your suggestion wrong...

-chris


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Eric W. Biederman
Chris Mason [EMAIL PROTECTED] writes:

 On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote:
 Chris Mason [EMAIL PROTECTED] writes:
 
  So, the problem is using the Dirty bit to indicate pinned.  You're
  completely right that our current setup of buffer heads and pages and
  filesystpem metadata is complex and difficult.
 
  But, moving the buffer heads off of the page cache pages isn't going to
  make it any easier to use dirty as pinned, especially in the face of
  buffer_head users for file data pages.
 
 Let me specific.  Not moving buffer_heads off of page cache pages,
 moving buffer_heads off of the block devices page cache pages.
 
 My problem is the coupling of how block devices are cached and the
 implementation of buffer heads, and by removing that coupling
 we can generally make things better.  Currently that coupling
 means silly things like all block devices are cached in low memory.
 Which probably isn't what you want if you actually have a use
 for block devices.
 
 For the ramdisk case in particular what this means is that there
 are no more users that create buffer_head mappings on the block
 device cache so using the dirty bit will be safe.

 Ok, we move the buffer heads off to a different inode, and that indoe
 has pages.  The pages on the inode still need to get pinned, how does
 that pinning happen?

 The problem you described where someone cleans a page because the buffer
 heads are clean happens already without help from userland.  So, keeping
 the pages away from userland won't save them from cleaning.

 Sorry if I'm reading your suggestion wrong...

Yes.  I was suggesting to continue to pin the pages for the page
cache pages block device inode, and having the buffer cache live
on some other inode.  Thus not causing me problems by adding clean
buffer_heads to my dirty pages.

Eric

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-17 Thread Christian Borntraeger
Eric,

Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger:
 Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:
 
  fs/buffer.c |    3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
   drivers/block/rd.c |   13 +
   1 files changed, 1 insertions(+), 12 deletions(-)
 
 Your patches look sane so far. I have applied both patches, and the problem 
 seems gone. I will try to get these patches to our testers.
 
 As long as they dont find new problems:

Our testers did only a short test, and then they were stopped by problems with
reiserfs. At the moment I cannot say for sure if your patch caused this, but 
we got the following BUG

ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr 
storage.
[ cut here ]
kernel BUG at 
/home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117!
illegal operation: 0001 [#1]
Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur
CPU:3Not tainted
Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88)
Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0
Krnl GPRS: 0002 7411b5c8 002b 
   7b04d000 0001  76d1de00
   7513eec0 0003 0012 77f77680
   7411b608 fb343b7e fb34404a 7513ee50
Krnl Code: fb344374: a7210002   tmll%r2,2
   fb344378: a7840004   brc 8,fb344380
   fb34437c: a7f40001   brc 15,fb34437e
  fb344380: 5810d8c2   l   %r1,2242(%r13)
   fb344384: 5820b03c   l   %r2,60(%r11)
   fb344388: 0de1   basr%r14,%r1
   fb34438a: 5810d90e   l   %r1,2318(%r13)
   fb34438e: 5820b03c   l   %r2,60(%r11)


Looking at the code, this really seems related to dirty buffers, so your patch
is the main suspect at the moment. 

if (!barrier) {
/* If there was a write error in the journal - we can't commit
 * this transaction - it will be invalid and, if successful,
 * will just end up propagating the write error out to
 * the file system. */
if (likely(!retval  !reiserfs_is_journal_aborted (journal))) {
if (buffer_dirty(jl-j_commit_bh))
1117BUG();
mark_buffer_dirty(jl-j_commit_bh) ;
sync_dirty_buffer(jl-j_commit_bh) ;
}
}

Christian
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Nick Piggin
On Wednesday 17 October 2007 05:06, Eric W. Biederman wrote:
> Nick Piggin <[EMAIL PROTECTED]> writes:
> > On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote:
> >> I have not observed this case but it is possible to get a dirty page
> >> cache with clean buffer heads if we get a clean ramdisk page with
> >> buffer heads generated by a filesystem calling __getblk and then write
> >> to that page from user space through the block device.  Then we just
> >> need to hit the proper window and try_to_free_buffers() will mark that
> >> page clean and eventually drop it.  Ouch!
> >>
> >> To fix this use the generic __set_page_dirty_buffers in the ramdisk
> >> code so that when we mark a page dirty we also mark it's buffer heads
> >> dirty.
> >
> > Hmm, so we can also have some filesystems writing their own buffers
> > out by hand (clear_buffer_dirty, submit buffer for IO). Other places
> > will do similarly dodgy things with filesystem metadata
> > (fsync_buffers_list, for example).
> >
> > So your buffers get cleaned again, then your pages get cleaned.
>
> So I just took a little bit of time to look at and think about
> the path you are referring to, and I don't see a problem.
>
> The rule with the buffer dirty bit is that you first clear it
> and then you submit the write.  When the write request finally
> makes it's way to rd.c we copy the data if necessary and call
> set_page_dirty.  Which will then mark the page and the buffer
> dirty again.

Oh, maybe you're right. I didn't see it redirty the page there.


> In essence the ramdisk code just attempts to lock buffers in
> ram by setting their dirty bit, just like we do for pages
> in ramfs.

Yeah, which is half the reason why its so complicated. Logically
it should just hold another reference on the pages rather than
interfere with pagecache state, but it can't do that because it
doesn't always know when a new page is inserted.


> > While I said it was a good fix when I saw the patch earlier, I think
> > it's not closing the entire hole, and as such, Christian's patch is
> > probably the way to go for stable.
>
> I thought through the logic in try_to_free_buffers and it actually
> makes sense to me now.  mark_buffer_dirty sets the page dirty bit
> so dirty buffers reside on dirty pages.  When we submit I/O we
> aren't guaranteed to submit all of the dirty buffers for a page
> at once, so we don't clear the page dirty bit.  With the result
> that we can end up with pages with the dirty bit set but all of
> their buffers are clean.

Yep.


> Since we rather deliberately allow truly clean pages to be dropped
> from the ramdisk overriding try_to_free_buffer_pages looks wrong
> because then except for invalidate we can not remove buffers
> from truly clean pages.

Yeah, if your fix works I guess it is better to use it and converge
code rather than diverge it even more.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Eric W. Biederman
Nick Piggin <[EMAIL PROTECTED]> writes:

> On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote:
>> I have not observed this case but it is possible to get a dirty page
>> cache with clean buffer heads if we get a clean ramdisk page with
>> buffer heads generated by a filesystem calling __getblk and then write
>> to that page from user space through the block device.  Then we just
>> need to hit the proper window and try_to_free_buffers() will mark that
>> page clean and eventually drop it.  Ouch!
>>
>> To fix this use the generic __set_page_dirty_buffers in the ramdisk
>> code so that when we mark a page dirty we also mark it's buffer heads
>> dirty.
>
> Hmm, so we can also have some filesystems writing their own buffers
> out by hand (clear_buffer_dirty, submit buffer for IO). Other places
> will do similarly dodgy things with filesystem metadata
> (fsync_buffers_list, for example).
>
> So your buffers get cleaned again, then your pages get cleaned.

So I just took a little bit of time to look at and think about
the path you are referring to, and I don't see a problem.

The rule with the buffer dirty bit is that you first clear it
and then you submit the write.  When the write request finally
makes it's way to rd.c we copy the data if necessary and call
set_page_dirty.  Which will then mark the page and the buffer
dirty again.

In essence the ramdisk code just attempts to lock buffers in
ram by setting their dirty bit, just like we do for pages
in ramfs.

The only case where I see the dirty bit getting cleared without
submitting I/O is when dirty state doesn't matter, in which
case if we get a page full buffers all of whose data we don't care
about it is legitimate to drop the page.

As for ramdisk_writepage it probably makes sense to remove it,
as the generic code paths seem to work as well or better the
writepage method is NULL.  However if we do keep it we should call
set_page_dirty there on general principles.

> While I said it was a good fix when I saw the patch earlier, I think
> it's not closing the entire hole, and as such, Christian's patch is
> probably the way to go for stable.

I thought through the logic in try_to_free_buffers and it actually
makes sense to me now.  mark_buffer_dirty sets the page dirty bit
so dirty buffers reside on dirty pages.  When we submit I/O we
aren't guaranteed to submit all of the dirty buffers for a page
at once, so we don't clear the page dirty bit.  With the result
that we can end up with pages with the dirty bit set but all of
their buffers are clean.

Since we rather deliberately allow truly clean pages to be dropped
from the ramdisk overriding try_to_free_buffer_pages looks wrong
because then except for invalidate we can not remove buffers
from truly clean pages.

> For mainline, *if* we want to keep the old rd.c around at all, I
> don't see any harm in this patch so long as Christian's is merged
> as well. Sharing common code is always good.

I do agree that with the amount of code duplication in the buffer
cache that locking pages into the buffer cache seems very error prone,
and difficult to maintain.  So rewriting rd.c to store it's pages
elsewhere sounds very promising.

While I can see Christian's patch as fixing the symptom.  I have
a very hard time see it as correct.  If we did something more
elaborate to replace try_to_free_buffer_pages such that
we could drop buffers from clean buffer cache pages when they
became such and so were only suppressing the drop the dirty
bit logic for pages that contain data we want to keep I would be ok
with it.Just totally skipping buffer head freeing just feels
wrong to me.

Eric


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Eric W. Biederman
Christian Borntraeger <[EMAIL PROTECTED]> writes:

> Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:
>
>> fs/buffer.c |    3 +++
>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>  drivers/block/rd.c |   13 +
>>  1 files changed, 1 insertions(+), 12 deletions(-)
>
> Your patches look sane so far. I have applied both patches, and the problem 
> seems gone. I will try to get these patches to our testers.
>
> As long as they dont find new problems:

Sounds good.  Please make certain to test reiserfs as well as ext2+ as
it seems to strain the ramdisk code more aggressively.

> Acked-by: Christian Borntraeger <[EMAIL PROTECTED]>
>
> Nick, Eric. What is the best patch for the stable series? Both patches from
> Eric or mine? I CCed stable, so they know that something is coming.

My gut feel says my patches assuming everything tests out ok, as
they actually fix the problem instead of papering over it, and there
isn't really a size difference.

In addition the change to init_page_buffers is interesting all by
itself.  With that patch we now have the option of running block
devices in mode where we don't bother to cache the buffer heads
which should reduce memory pressure a bit.  Not that an enhancement
like that will show up in stable, but...

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Christian Borntraeger
Am Dienstag, 16. Oktober 2007 schrieb Nick Piggin:
> While I said it was a good fix when I saw the patch earlier, I think
> it's not closing the entire hole, and as such, Christian's patch is
> probably the way to go for stable.

That would be my preferred method. Merge Erics and my fixup for 2.6.24-rc.
The only open questions is, what was the reiserfs problem? Is it still
causes by Erics patches?

> For mainline, *if* we want to keep the old rd.c around at all, I
> don't see any harm in this patch so long as Christian's is merged
> as well. Sharing common code is always good.

While the merge window is still open, to me the new ramdisk code seems
more like a 2.6.25-rc thing. We actually should test the behaviour in 
low memory scenarios. What do you think?

Christian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread rubenjr_22
hi!to everybody i need sex video

--
This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com
http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread rubenjr_22
hi! i need sex video

--
This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com
http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Nick Piggin
On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote:
> I have not observed this case but it is possible to get a dirty page
> cache with clean buffer heads if we get a clean ramdisk page with
> buffer heads generated by a filesystem calling __getblk and then write
> to that page from user space through the block device.  Then we just
> need to hit the proper window and try_to_free_buffers() will mark that
> page clean and eventually drop it.  Ouch!
>
> To fix this use the generic __set_page_dirty_buffers in the ramdisk
> code so that when we mark a page dirty we also mark it's buffer heads
> dirty.

Hmm, so we can also have some filesystems writing their own buffers
out by hand (clear_buffer_dirty, submit buffer for IO). Other places
will do similarly dodgy things with filesystem metadata
(fsync_buffers_list, for example).

So your buffers get cleaned again, then your pages get cleaned.

While I said it was a good fix when I saw the patch earlier, I think
it's not closing the entire hole, and as such, Christian's patch is
probably the way to go for stable.

For mainline, *if* we want to keep the old rd.c around at all, I
don't see any harm in this patch so long as Christian's is merged
as well. Sharing common code is always good.

>
> Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
> ---
>  drivers/block/rd.c |   13 +
>  1 files changed, 1 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/block/rd.c b/drivers/block/rd.c
> index 701ea77..84163da 100644
> --- a/drivers/block/rd.c
> +++ b/drivers/block/rd.c
> @@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space
> *mapping, return 0;
>  }
>
> -/*
> - * ramdisk blockdev pages have their own ->set_page_dirty() because we
> don't - * want them to contribute to dirty memory accounting.
> - */
> -static int ramdisk_set_page_dirty(struct page *page)
> -{
> - if (!TestSetPageDirty(page))
> - return 1;
> - return 0;
> -}
> -
>  static const struct address_space_operations ramdisk_aops = {
>   .readpage   = ramdisk_readpage,
>   .prepare_write  = ramdisk_prepare_write,
>   .commit_write   = ramdisk_commit_write,
>   .writepage  = ramdisk_writepage,
> - .set_page_dirty = ramdisk_set_page_dirty,
> + .set_page_dirty = __set_page_dirty_buffers,
>   .writepages = ramdisk_writepages,
>  };
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Christian Borntraeger
Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:

> fs/buffer.c |    3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>  drivers/block/rd.c |   13 +
>  1 files changed, 1 insertions(+), 12 deletions(-)

Your patches look sane so far. I have applied both patches, and the problem 
seems gone. I will try to get these patches to our testers.

As long as they dont find new problems:

Acked-by: Christian Borntraeger <[EMAIL PROTECTED]>

Nick, Eric. What is the best patch for the stable series? Both patches from
Eric or mine? I CCed stable, so they know that something is coming.

Christian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Christian Borntraeger
Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:

 fs/buffer.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)
  drivers/block/rd.c |   13 +
  1 files changed, 1 insertions(+), 12 deletions(-)

Your patches look sane so far. I have applied both patches, and the problem 
seems gone. I will try to get these patches to our testers.

As long as they dont find new problems:

Acked-by: Christian Borntraeger [EMAIL PROTECTED]

Nick, Eric. What is the best patch for the stable series? Both patches from
Eric or mine? I CCed stable, so they know that something is coming.

Christian
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread rubenjr_22
hi! i need sex video

--
This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com
http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Nick Piggin
On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote:
 I have not observed this case but it is possible to get a dirty page
 cache with clean buffer heads if we get a clean ramdisk page with
 buffer heads generated by a filesystem calling __getblk and then write
 to that page from user space through the block device.  Then we just
 need to hit the proper window and try_to_free_buffers() will mark that
 page clean and eventually drop it.  Ouch!

 To fix this use the generic __set_page_dirty_buffers in the ramdisk
 code so that when we mark a page dirty we also mark it's buffer heads
 dirty.

Hmm, so we can also have some filesystems writing their own buffers
out by hand (clear_buffer_dirty, submit buffer for IO). Other places
will do similarly dodgy things with filesystem metadata
(fsync_buffers_list, for example).

So your buffers get cleaned again, then your pages get cleaned.

While I said it was a good fix when I saw the patch earlier, I think
it's not closing the entire hole, and as such, Christian's patch is
probably the way to go for stable.

For mainline, *if* we want to keep the old rd.c around at all, I
don't see any harm in this patch so long as Christian's is merged
as well. Sharing common code is always good.


 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 ---
  drivers/block/rd.c |   13 +
  1 files changed, 1 insertions(+), 12 deletions(-)

 diff --git a/drivers/block/rd.c b/drivers/block/rd.c
 index 701ea77..84163da 100644
 --- a/drivers/block/rd.c
 +++ b/drivers/block/rd.c
 @@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space
 *mapping, return 0;
  }

 -/*
 - * ramdisk blockdev pages have their own -set_page_dirty() because we
 don't - * want them to contribute to dirty memory accounting.
 - */
 -static int ramdisk_set_page_dirty(struct page *page)
 -{
 - if (!TestSetPageDirty(page))
 - return 1;
 - return 0;
 -}
 -
  static const struct address_space_operations ramdisk_aops = {
   .readpage   = ramdisk_readpage,
   .prepare_write  = ramdisk_prepare_write,
   .commit_write   = ramdisk_commit_write,
   .writepage  = ramdisk_writepage,
 - .set_page_dirty = ramdisk_set_page_dirty,
 + .set_page_dirty = __set_page_dirty_buffers,
   .writepages = ramdisk_writepages,
  };
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread rubenjr_22
hi!to everybody i need sex video

--
This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com
http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Christian Borntraeger
Am Dienstag, 16. Oktober 2007 schrieb Nick Piggin:
 While I said it was a good fix when I saw the patch earlier, I think
 it's not closing the entire hole, and as such, Christian's patch is
 probably the way to go for stable.

That would be my preferred method. Merge Erics and my fixup for 2.6.24-rc.
The only open questions is, what was the reiserfs problem? Is it still
causes by Erics patches?

 For mainline, *if* we want to keep the old rd.c around at all, I
 don't see any harm in this patch so long as Christian's is merged
 as well. Sharing common code is always good.

While the merge window is still open, to me the new ramdisk code seems
more like a 2.6.25-rc thing. We actually should test the behaviour in 
low memory scenarios. What do you think?

Christian
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Eric W. Biederman
Christian Borntraeger [EMAIL PROTECTED] writes:

 Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman:

 fs/buffer.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)
  drivers/block/rd.c |   13 +
  1 files changed, 1 insertions(+), 12 deletions(-)

 Your patches look sane so far. I have applied both patches, and the problem 
 seems gone. I will try to get these patches to our testers.

 As long as they dont find new problems:

Sounds good.  Please make certain to test reiserfs as well as ext2+ as
it seems to strain the ramdisk code more aggressively.

 Acked-by: Christian Borntraeger [EMAIL PROTECTED]

 Nick, Eric. What is the best patch for the stable series? Both patches from
 Eric or mine? I CCed stable, so they know that something is coming.

My gut feel says my patches assuming everything tests out ok, as
they actually fix the problem instead of papering over it, and there
isn't really a size difference.

In addition the change to init_page_buffers is interesting all by
itself.  With that patch we now have the option of running block
devices in mode where we don't bother to cache the buffer heads
which should reduce memory pressure a bit.  Not that an enhancement
like that will show up in stable, but...

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Eric W. Biederman
Nick Piggin [EMAIL PROTECTED] writes:

 On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote:
 I have not observed this case but it is possible to get a dirty page
 cache with clean buffer heads if we get a clean ramdisk page with
 buffer heads generated by a filesystem calling __getblk and then write
 to that page from user space through the block device.  Then we just
 need to hit the proper window and try_to_free_buffers() will mark that
 page clean and eventually drop it.  Ouch!

 To fix this use the generic __set_page_dirty_buffers in the ramdisk
 code so that when we mark a page dirty we also mark it's buffer heads
 dirty.

 Hmm, so we can also have some filesystems writing their own buffers
 out by hand (clear_buffer_dirty, submit buffer for IO). Other places
 will do similarly dodgy things with filesystem metadata
 (fsync_buffers_list, for example).

 So your buffers get cleaned again, then your pages get cleaned.

So I just took a little bit of time to look at and think about
the path you are referring to, and I don't see a problem.

The rule with the buffer dirty bit is that you first clear it
and then you submit the write.  When the write request finally
makes it's way to rd.c we copy the data if necessary and call
set_page_dirty.  Which will then mark the page and the buffer
dirty again.

In essence the ramdisk code just attempts to lock buffers in
ram by setting their dirty bit, just like we do for pages
in ramfs.

The only case where I see the dirty bit getting cleared without
submitting I/O is when dirty state doesn't matter, in which
case if we get a page full buffers all of whose data we don't care
about it is legitimate to drop the page.

As for ramdisk_writepage it probably makes sense to remove it,
as the generic code paths seem to work as well or better the
writepage method is NULL.  However if we do keep it we should call
set_page_dirty there on general principles.

 While I said it was a good fix when I saw the patch earlier, I think
 it's not closing the entire hole, and as such, Christian's patch is
 probably the way to go for stable.

I thought through the logic in try_to_free_buffers and it actually
makes sense to me now.  mark_buffer_dirty sets the page dirty bit
so dirty buffers reside on dirty pages.  When we submit I/O we
aren't guaranteed to submit all of the dirty buffers for a page
at once, so we don't clear the page dirty bit.  With the result
that we can end up with pages with the dirty bit set but all of
their buffers are clean.

Since we rather deliberately allow truly clean pages to be dropped
from the ramdisk overriding try_to_free_buffer_pages looks wrong
because then except for invalidate we can not remove buffers
from truly clean pages.

 For mainline, *if* we want to keep the old rd.c around at all, I
 don't see any harm in this patch so long as Christian's is merged
 as well. Sharing common code is always good.

I do agree that with the amount of code duplication in the buffer
cache that locking pages into the buffer cache seems very error prone,
and difficult to maintain.  So rewriting rd.c to store it's pages
elsewhere sounds very promising.

While I can see Christian's patch as fixing the symptom.  I have
a very hard time see it as correct.  If we did something more
elaborate to replace try_to_free_buffer_pages such that
we could drop buffers from clean buffer cache pages when they
became such and so were only suppressing the drop the dirty
bit logic for pages that contain data we want to keep I would be ok
with it.Just totally skipping buffer head freeing just feels
wrong to me.

Eric


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-16 Thread Nick Piggin
On Wednesday 17 October 2007 05:06, Eric W. Biederman wrote:
 Nick Piggin [EMAIL PROTECTED] writes:
  On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote:
  I have not observed this case but it is possible to get a dirty page
  cache with clean buffer heads if we get a clean ramdisk page with
  buffer heads generated by a filesystem calling __getblk and then write
  to that page from user space through the block device.  Then we just
  need to hit the proper window and try_to_free_buffers() will mark that
  page clean and eventually drop it.  Ouch!
 
  To fix this use the generic __set_page_dirty_buffers in the ramdisk
  code so that when we mark a page dirty we also mark it's buffer heads
  dirty.
 
  Hmm, so we can also have some filesystems writing their own buffers
  out by hand (clear_buffer_dirty, submit buffer for IO). Other places
  will do similarly dodgy things with filesystem metadata
  (fsync_buffers_list, for example).
 
  So your buffers get cleaned again, then your pages get cleaned.

 So I just took a little bit of time to look at and think about
 the path you are referring to, and I don't see a problem.

 The rule with the buffer dirty bit is that you first clear it
 and then you submit the write.  When the write request finally
 makes it's way to rd.c we copy the data if necessary and call
 set_page_dirty.  Which will then mark the page and the buffer
 dirty again.

Oh, maybe you're right. I didn't see it redirty the page there.


 In essence the ramdisk code just attempts to lock buffers in
 ram by setting their dirty bit, just like we do for pages
 in ramfs.

Yeah, which is half the reason why its so complicated. Logically
it should just hold another reference on the pages rather than
interfere with pagecache state, but it can't do that because it
doesn't always know when a new page is inserted.


  While I said it was a good fix when I saw the patch earlier, I think
  it's not closing the entire hole, and as such, Christian's patch is
  probably the way to go for stable.

 I thought through the logic in try_to_free_buffers and it actually
 makes sense to me now.  mark_buffer_dirty sets the page dirty bit
 so dirty buffers reside on dirty pages.  When we submit I/O we
 aren't guaranteed to submit all of the dirty buffers for a page
 at once, so we don't clear the page dirty bit.  With the result
 that we can end up with pages with the dirty bit set but all of
 their buffers are clean.

Yep.


 Since we rather deliberately allow truly clean pages to be dropped
 from the ramdisk overriding try_to_free_buffer_pages looks wrong
 because then except for invalidate we can not remove buffers
 from truly clean pages.

Yeah, if your fix works I guess it is better to use it and converge
code rather than diverge it even more.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-15 Thread Eric W. Biederman

I have not observed this case but it is possible to get a dirty page
cache with clean buffer heads if we get a clean ramdisk page with
buffer heads generated by a filesystem calling __getblk and then write
to that page from user space through the block device.  Then we just
need to hit the proper window and try_to_free_buffers() will mark that
page clean and eventually drop it.  Ouch!

To fix this use the generic __set_page_dirty_buffers in the ramdisk
code so that when we mark a page dirty we also mark it's buffer heads
dirty.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 drivers/block/rd.c |   13 +
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/drivers/block/rd.c b/drivers/block/rd.c
index 701ea77..84163da 100644
--- a/drivers/block/rd.c
+++ b/drivers/block/rd.c
@@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space 
*mapping,
return 0;
 }
 
-/*
- * ramdisk blockdev pages have their own ->set_page_dirty() because we don't
- * want them to contribute to dirty memory accounting.
- */
-static int ramdisk_set_page_dirty(struct page *page)
-{
-   if (!TestSetPageDirty(page))
-   return 1;
-   return 0;
-}
-
 static const struct address_space_operations ramdisk_aops = {
.readpage   = ramdisk_readpage,
.prepare_write  = ramdisk_prepare_write,
.commit_write   = ramdisk_commit_write,
.writepage  = ramdisk_writepage,
-   .set_page_dirty = ramdisk_set_page_dirty,
+   .set_page_dirty = __set_page_dirty_buffers,
.writepages = ramdisk_writepages,
 };
 
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rd: Mark ramdisk buffers heads dirty

2007-10-15 Thread Eric W. Biederman

I have not observed this case but it is possible to get a dirty page
cache with clean buffer heads if we get a clean ramdisk page with
buffer heads generated by a filesystem calling __getblk and then write
to that page from user space through the block device.  Then we just
need to hit the proper window and try_to_free_buffers() will mark that
page clean and eventually drop it.  Ouch!

To fix this use the generic __set_page_dirty_buffers in the ramdisk
code so that when we mark a page dirty we also mark it's buffer heads
dirty.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 drivers/block/rd.c |   13 +
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/drivers/block/rd.c b/drivers/block/rd.c
index 701ea77..84163da 100644
--- a/drivers/block/rd.c
+++ b/drivers/block/rd.c
@@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space 
*mapping,
return 0;
 }
 
-/*
- * ramdisk blockdev pages have their own -set_page_dirty() because we don't
- * want them to contribute to dirty memory accounting.
- */
-static int ramdisk_set_page_dirty(struct page *page)
-{
-   if (!TestSetPageDirty(page))
-   return 1;
-   return 0;
-}
-
 static const struct address_space_operations ramdisk_aops = {
.readpage   = ramdisk_readpage,
.prepare_write  = ramdisk_prepare_write,
.commit_write   = ramdisk_commit_write,
.writepage  = ramdisk_writepage,
-   .set_page_dirty = ramdisk_set_page_dirty,
+   .set_page_dirty = __set_page_dirty_buffers,
.writepages = ramdisk_writepages,
 };
 
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/