Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger <[EMAIL PROTECTED]> writes: > Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman: >> Grr. Inconsistent rules on a core piece of infrastructure. >> It looks like that if there is any trivial/minimal fix it >> is based on your patch suppressing try_to_free_buffers. Ugh. >> >> Eric > > Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24 > and doing a nicer fix (rd rewrite for example for post 2.6.24)? Looking at it. If we don't get carried away using our own private inode is barely more difficult then stomping on release_page and in a number of ways a whole lot more subtle. At least for 2.6.24 I think it makes a sane fix, and quite possibly as a back port as well. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger [EMAIL PROTECTED] writes: Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman: Grr. Inconsistent rules on a core piece of infrastructure. It looks like that if there is any trivial/minimal fix it is based on your patch suppressing try_to_free_buffers. Ugh. Eric Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24 and doing a nicer fix (rd rewrite for example for post 2.6.24)? Looking at it. If we don't get carried away using our own private inode is barely more difficult then stomping on release_page and in a number of ways a whole lot more subtle. At least for 2.6.24 I think it makes a sane fix, and quite possibly as a back port as well. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman: > Grr. Inconsistent rules on a core piece of infrastructure. > It looks like that if there is any trivial/minimal fix it > is based on your patch suppressing try_to_free_buffers. Ugh. > > Eric Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24 and doing a nicer fix (rd rewrite for example for post 2.6.24)? Christian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Donnerstag, 18. Oktober 2007 schrieb Eric W. Biederman: Grr. Inconsistent rules on a core piece of infrastructure. It looks like that if there is any trivial/minimal fix it is based on your patch suppressing try_to_free_buffers. Ugh. Eric Ok. What do you think about having my patch for 2.6.23 stable, for 2.6.24 and doing a nicer fix (rd rewrite for example for post 2.6.24)? Christian - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason <[EMAIL PROTECTED]> writes: > On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote: >> Chris Mason <[EMAIL PROTECTED]> writes: >> >> > So, the problem is using the Dirty bit to indicate pinned. You're >> > completely right that our current setup of buffer heads and pages and >> > filesystpem metadata is complex and difficult. >> > >> > But, moving the buffer heads off of the page cache pages isn't going to >> > make it any easier to use dirty as pinned, especially in the face of >> > buffer_head users for file data pages. >> >> Let me specific. Not moving buffer_heads off of page cache pages, >> moving buffer_heads off of the block devices page cache pages. >> >> My problem is the coupling of how block devices are cached and the >> implementation of buffer heads, and by removing that coupling >> we can generally make things better. Currently that coupling >> means silly things like all block devices are cached in low memory. >> Which probably isn't what you want if you actually have a use >> for block devices. >> >> For the ramdisk case in particular what this means is that there >> are no more users that create buffer_head mappings on the block >> device cache so using the dirty bit will be safe. > > Ok, we move the buffer heads off to a different inode, and that indoe > has pages. The pages on the inode still need to get pinned, how does > that pinning happen? > The problem you described where someone cleans a page because the buffer > heads are clean happens already without help from userland. So, keeping > the pages away from userland won't save them from cleaning. > > Sorry if I'm reading your suggestion wrong... Yes. I was suggesting to continue to pin the pages for the page cache pages block device inode, and having the buffer cache live on some other inode. Thus not causing me problems by adding clean buffer_heads to my dirty pages. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote: > Chris Mason <[EMAIL PROTECTED]> writes: > > > So, the problem is using the Dirty bit to indicate pinned. You're > > completely right that our current setup of buffer heads and pages and > > filesystpem metadata is complex and difficult. > > > > But, moving the buffer heads off of the page cache pages isn't going to > > make it any easier to use dirty as pinned, especially in the face of > > buffer_head users for file data pages. > > Let me specific. Not moving buffer_heads off of page cache pages, > moving buffer_heads off of the block devices page cache pages. > > My problem is the coupling of how block devices are cached and the > implementation of buffer heads, and by removing that coupling > we can generally make things better. Currently that coupling > means silly things like all block devices are cached in low memory. > Which probably isn't what you want if you actually have a use > for block devices. > > For the ramdisk case in particular what this means is that there > are no more users that create buffer_head mappings on the block > device cache so using the dirty bit will be safe. Ok, we move the buffer heads off to a different inode, and that indoe has pages. The pages on the inode still need to get pinned, how does that pinning happen? The problem you described where someone cleans a page because the buffer heads are clean happens already without help from userland. So, keeping the pages away from userland won't save them from cleaning. Sorry if I'm reading your suggestion wrong... -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason <[EMAIL PROTECTED]> writes: > So, the problem is using the Dirty bit to indicate pinned. You're > completely right that our current setup of buffer heads and pages and > filesystpem metadata is complex and difficult. > > But, moving the buffer heads off of the page cache pages isn't going to > make it any easier to use dirty as pinned, especially in the face of > buffer_head users for file data pages. Let me specific. Not moving buffer_heads off of page cache pages, moving buffer_heads off of the block devices page cache pages. My problem is the coupling of how block devices are cached and the implementation of buffer heads, and by removing that coupling we can generally make things better. Currently that coupling means silly things like all block devices are cached in low memory. Which probably isn't what you want if you actually have a use for block devices. For the ramdisk case in particular what this means is that there are no more users that create buffer_head mappings on the block device cache so using the dirty bit will be safe. Further it removes the nasty possibility of user space messing with metadata buffer head state. So the only way those cases can happen is a code bug, or a hardware bug. So I think by removing these unnecessary code paths things will become easier to work with. > You've already seen Nick fsblock code, but you can see my general > approach to replacing buffer heads here: > > http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h > > (alpha quality implementation in extent_map.c and users in inode.c) The > basic idea is to do extent based record keeping for mapping and state of > things in the filesystem, and to avoid attaching these things to the > page. Interesting. Something to dig into. > Don't get me wrong, I'd love to see a simple and coherent fix for what > reiserfs and ext3 do with buffer head state, but I think for the short > term you're best off pinning the ramdisk pages via some other means. Yes. And the problem is hard enough to trigger that a short term fix is actually of debatable value. The reason this hasn't shown up more frequently is that it only ever triggers if you are in the buffer head reclaim state, which on a 64bit box means you have to use < 4K buffers and have your ram cache another block device. That plus most people use initramfs these days. For the short term we have Christian's other patch which simply disables calling try_to_free_buffers. Although that really feels like a hack to me. For 2.6.25 I think I have a shot at fixing these things cleanly. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 15:30 -0600, Eric W. Biederman wrote: > Chris Mason <[EMAIL PROTECTED]> writes: > > >> Thinking about it. I don't believe anyone has ever intentionally built > >> a filesystem tool that depends on being able to modify a file systems > >> metadata buffer heads while the filesystem is running, and doing that > >> would seem to be fragile as it would require a lot of cooperation > >> between the tool and the filesystem about how the filesystem uses and > >> implement things. > >> > > > > That's right. For example, ext2 is doing directories in the page cache > > of the directory inode, so there's a cache alias between the block > > device page cache and the directory inode page cache. > > > >> Now I guess I need to see how difficult a patch would be to give > >> filesystems magic inodes to keep their metadata buffer heads in. > > > > Not hard, the block device inode is already a magic inode for metadata > > buffer heads. You could just make another one attached to the bdev. > > > > But, I don't think I fully understand the problem you're trying to > > solve? > > > So the start: > When we write buffers from the buffer cache we clear buffer_dirty > but not PageDirty > > So try_to_free_buffers() will mark any page with clean buffer_heads > that is not clean itself clean. > > The ramdisk set pages dirty to keep them from being removed from the > page cache, just like ramfs. > So, the problem is using the Dirty bit to indicate pinned. You're completely right that our current setup of buffer heads and pages and filesystem metadata is complex and difficult. But, moving the buffer heads off of the page cache pages isn't going to make it any easier to use dirty as pinned, especially in the face of buffer_head users for file data pages. You've already seen Nick fsblock code, but you can see my general approach to replacing buffer heads here: http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h (alpha quality implementation in extent_map.c and users in inode.c) The basic idea is to do extent based record keeping for mapping and state of things in the filesystem, and to avoid attaching these things to the page. > Unfortunately when those dirty ramdisk pages get buffers on them and > those buffers all go clean and we are trying to reclaim buffer_heads > we drop those pages from the page cache. Ouch! > > We can fix the ramdisk by setting making certain that buffer_heads > on ramdisk pages stay dirty as well. The problem is this breaks > filesystems like reiserfs and ext3 that expect to be able to make > buffer_heads clean sometimes. > > There are other ways to solve this for ramdisks, such as changing > where ramdisks are stored. However fixing the ramdisks this way > still leaves the general problem that there are other paths to the > filesystem metadata buffers, and those other paths cause the code > to be complicated and buggy. > > So I'm trying to see if we can untangle this Gordian knot, so the > code because more easily maintainable. > Don't get me wrong, I'd love to see a simple and coherent fix for what reiserfs and ext3 do with buffer head state, but I think for the short term you're best off pinning the ramdisk pages via some other means. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger <[EMAIL PROTECTED]> writes: > Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman: >> Did you have both of my changes applied? >> To init_page_buffer() and to the ramdisk_set_dirty_page? > > Yes, I removed my patch and applied both patches from you. Thanks. Grr. Inconsistent rules on a core piece of infrastructure. It looks like that if there is any trivial/minimal fix it is based on your patch suppressing try_to_free_buffers. Ugh. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman: > Did you have both of my changes applied? > To init_page_buffer() and to the ramdisk_set_dirty_page? Yes, I removed my patch and applied both patches from you. Christian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason <[EMAIL PROTECTED]> writes: >> Thinking about it. I don't believe anyone has ever intentionally built >> a filesystem tool that depends on being able to modify a file systems >> metadata buffer heads while the filesystem is running, and doing that >> would seem to be fragile as it would require a lot of cooperation >> between the tool and the filesystem about how the filesystem uses and >> implement things. >> > > That's right. For example, ext2 is doing directories in the page cache > of the directory inode, so there's a cache alias between the block > device page cache and the directory inode page cache. > >> Now I guess I need to see how difficult a patch would be to give >> filesystems magic inodes to keep their metadata buffer heads in. > > Not hard, the block device inode is already a magic inode for metadata > buffer heads. You could just make another one attached to the bdev. > > But, I don't think I fully understand the problem you're trying to > solve? So the start: When we write buffers from the buffer cache we clear buffer_dirty but not PageDirty So try_to_free_buffers() will mark any page with clean buffer_heads that is not clean itself clean. The ramdisk set pages dirty to keep them from being removed from the page cache, just like ramfs. Unfortunately when those dirty ramdisk pages get buffers on them and those buffers all go clean and we are trying to reclaim buffer_heads we drop those pages from the page cache. Ouch! We can fix the ramdisk by setting making certain that buffer_heads on ramdisk pages stay dirty as well. The problem is this breaks filesystems like reiserfs and ext3 that expect to be able to make buffer_heads clean sometimes. There are other ways to solve this for ramdisks, such as changing where ramdisks are stored. However fixing the ramdisks this way still leaves the general problem that there are other paths to the filesystem metadata buffers, and those other paths cause the code to be complicated and buggy. So I'm trying to see if we can untangle this Gordian knot, so the code because more easily maintainable. To make the buffer cache a helper library instead of require infrastructure, it looks like two things need to happen. - Move metadata buffer heads off block devices page cache entries, - Communicate the mappings of data pages to block device sectors in a generic way without buffer heads. How we ultimately fix the ramdisk tends to depend on how we untangle the buffer head problem. Right now the only simple solution is to suppress try_to_free_buffers, which is a bit ugly. We can also come up with a completely separate store for the pages in the buffer cache but if we wind up moving the metadata buffer heads anyway then that should not be necessary. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 14:29 -0600, Eric W. Biederman wrote: > Chris Mason <[EMAIL PROTECTED]> writes: > > > In this case, the commit block isn't allowed to be dirty before reiserfs > > decides it is safe to write it. The journal code expects it is the only > > spot in the kernel setting buffer heads dirty, and it only does so after > > the rest of the log blocks are safely on disk. > > Ok. So the journal code here fundamentally depends on being able to > control the order of the writes, and something else being able to set > the buffer head dirty messes up that control. > Right. > >> At the same time I increasingly don't think we should allow user space > >> to dirty or update our filesystem metadata buffer heads. That seems > >> like asking for trouble. > >> > > > > Demanding trouble ;) > > Looks like it. There are even comments in jbd about the same class > of problems. Apparently dump and tune2fs on mounted filesystems have > triggered some of these issues. The practical question is any of this > trouble worth handling. > > Thinking about it. I don't believe anyone has ever intentionally built > a filesystem tool that depends on being able to modify a file systems > metadata buffer heads while the filesystem is running, and doing that > would seem to be fragile as it would require a lot of cooperation > between the tool and the filesystem about how the filesystem uses and > implement things. > That's right. For example, ext2 is doing directories in the page cache of the directory inode, so there's a cache alias between the block device page cache and the directory inode page cache. > Now I guess I need to see how difficult a patch would be to give > filesystems magic inodes to keep their metadata buffer heads in. Not hard, the block device inode is already a magic inode for metadata buffer heads. You could just make another one attached to the bdev. But, I don't think I fully understand the problem you're trying to solve? -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason <[EMAIL PROTECTED]> writes: > In this case, the commit block isn't allowed to be dirty before reiserfs > decides it is safe to write it. The journal code expects it is the only > spot in the kernel setting buffer heads dirty, and it only does so after > the rest of the log blocks are safely on disk. Ok. So the journal code here fundamentally depends on being able to control the order of the writes, and something else being able to set the buffer head dirty messes up that control. > Given this is a ramdisk, the check can be ignored, but I'd rather not > sprinkle if (ram_disk) into the FS code It makes no sense to implement a ramdisk in a way that requires this. >> At the same time I increasingly don't think we should allow user space >> to dirty or update our filesystem metadata buffer heads. That seems >> like asking for trouble. >> > > Demanding trouble ;) Looks like it. There are even comments in jbd about the same class of problems. Apparently dump and tune2fs on mounted filesystems have triggered some of these issues. The practical question is any of this trouble worth handling. Thinking about it. I don't believe anyone has ever intentionally built a filesystem tool that depends on being able to modify a file systems metadata buffer heads while the filesystem is running, and doing that would seem to be fragile as it would require a lot of cooperation between the tool and the filesystem about how the filesystem uses and implement things. Now I guess I need to see how difficult a patch would be to give filesystems magic inodes to keep their metadata buffer heads in. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 11:57 -0600, Eric W. Biederman wrote: > Christian Borntraeger <[EMAIL PROTECTED]> writes: > > > Eric, > > > > Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger: > >> Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: > >> > >> > fs/buffer.c |3 +++ > >> > 1 files changed, 3 insertions(+), 0 deletions(-) > >> > drivers/block/rd.c | 13 + > >> > 1 files changed, 1 insertions(+), 12 deletions(-) > >> > >> Your patches look sane so far. I have applied both patches, and the > >> problem > >> seems gone. I will try to get these patches to our testers. > >> > >> As long as they dont find new problems: > > > > Our testers did only a short test, and then they were stopped by problems > > with > > reiserfs. At the moment I cannot say for sure if your patch caused this, > > but > > we got the following BUG > > Thanks. > > > ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr > > storage. > > [ cut here ] > > kernel BUG at > > /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117! > > illegal operation: 0001 [#1] > > Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur > > CPU:3Not tainted > > Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88) > > Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs]) > >R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 > > Krnl GPRS: 0002 7411b5c8 002b > >7b04d000 0001 76d1de00 > >7513eec0 0003 0012 77f77680 > >7411b608 fb343b7e fb34404a 7513ee50 > > Krnl Code: fb344374: a7210002 tmll%r2,2 > >fb344378: a7840004 brc 8,fb344380 > >fb34437c: a7f40001 brc 15,fb34437e > > >fb344380: 5810d8c2 l %r1,2242(%r13) > >fb344384: 5820b03c l %r2,60(%r11) > >fb344388: 0de1 basr%r14,%r1 > >fb34438a: 5810d90e l %r1,2318(%r13) > >fb34438e: 5820b03c l %r2,60(%r11) > > > > > > Looking at the code, this really seems related to dirty buffers, so your > > patch > > is the main suspect at the moment. > > Sounds reasonable. > > > if (!barrier) { > > /* If there was a write error in the journal - we can't > > commit > > * this transaction - it will be invalid and, if successful, > > * will just end up propagating the write error out to > > * the file system. */ > > if (likely(!retval && !reiserfs_is_journal_aborted > > (journal))) { > > if (buffer_dirty(jl->j_commit_bh)) > > 1117>BUG(); > > mark_buffer_dirty(jl->j_commit_bh) ; > > sync_dirty_buffer(jl->j_commit_bh) ; > > } > > } > > Grr. I'm not certain how to call that. > > Given that I should also be able to trigger this case by writing to > the block device through the buffer cache (to the write spot at the > write moment) this feels like a reiserfs bug. > Although I feel > screaming about filesystems that go BUG instead of remount read-only > In this case, the commit block isn't allowed to be dirty before reiserfs decides it is safe to write it. The journal code expects it is the only spot in the kernel setting buffer heads dirty, and it only does so after the rest of the log blocks are safely on disk. Given this is a ramdisk, the check can be ignored, but I'd rather not sprinkle if (ram_disk) into the FS code > At the same time I increasingly don't think we should allow user space > to dirty or update our filesystem metadata buffer heads. That seems > like asking for trouble. > Demanding trouble ;) -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger <[EMAIL PROTECTED]> writes: > Eric, > > Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger: >> Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: >> >> > fs/buffer.c | 3 +++ >> > 1 files changed, 3 insertions(+), 0 deletions(-) >> > drivers/block/rd.c | 13 + >> > 1 files changed, 1 insertions(+), 12 deletions(-) >> >> Your patches look sane so far. I have applied both patches, and the problem >> seems gone. I will try to get these patches to our testers. >> >> As long as they dont find new problems: > > Our testers did only a short test, and then they were stopped by problems with > reiserfs. At the moment I cannot say for sure if your patch caused this, but > we got the following BUG Thanks. > ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr > storage. > [ cut here ] > kernel BUG at > /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117! > illegal operation: 0001 [#1] > Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur > CPU:3Not tainted > Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88) > Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs]) >R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 > Krnl GPRS: 0002 7411b5c8 002b >7b04d000 0001 76d1de00 >7513eec0 0003 0012 77f77680 >7411b608 fb343b7e fb34404a 7513ee50 > Krnl Code: fb344374: a7210002 tmll%r2,2 >fb344378: a7840004 brc 8,fb344380 >fb34437c: a7f40001 brc 15,fb34437e > >fb344380: 5810d8c2 l %r1,2242(%r13) >fb344384: 5820b03c l %r2,60(%r11) >fb344388: 0de1 basr%r14,%r1 >fb34438a: 5810d90e l %r1,2318(%r13) >fb34438e: 5820b03c l %r2,60(%r11) > > > Looking at the code, this really seems related to dirty buffers, so your patch > is the main suspect at the moment. Sounds reasonable. > if (!barrier) { > /* If there was a write error in the journal - we can't commit > * this transaction - it will be invalid and, if successful, > * will just end up propagating the write error out to > * the file system. */ > if (likely(!retval && !reiserfs_is_journal_aborted > (journal))) { > if (buffer_dirty(jl->j_commit_bh)) > 1117>BUG(); > mark_buffer_dirty(jl->j_commit_bh) ; > sync_dirty_buffer(jl->j_commit_bh) ; > } > } Grr. I'm not certain how to call that. Given that I should also be able to trigger this case by writing to the block device through the buffer cache (to the write spot at the write moment) this feels like a reiserfs bug. Although I feel screaming about filesystems that go BUG instead of remount read-only At the same time I increasingly don't think we should allow user space to dirty or update our filesystem metadata buffer heads. That seems like asking for trouble. Did you have both of my changes applied? To init_page_buffer() and to the ramdisk_set_dirty_page? I'm guessing it was the change to ramdisk_set_dirty_page Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Eric, Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger: > Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: > > > fs/buffer.c | 3 +++ > > 1 files changed, 3 insertions(+), 0 deletions(-) > > drivers/block/rd.c | 13 + > > 1 files changed, 1 insertions(+), 12 deletions(-) > > Your patches look sane so far. I have applied both patches, and the problem > seems gone. I will try to get these patches to our testers. > > As long as they dont find new problems: Our testers did only a short test, and then they were stopped by problems with reiserfs. At the moment I cannot say for sure if your patch caused this, but we got the following BUG ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr storage. [ cut here ] kernel BUG at /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117! illegal operation: 0001 [#1] Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur CPU:3Not tainted Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88) Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 Krnl GPRS: 0002 7411b5c8 002b 7b04d000 0001 76d1de00 7513eec0 0003 0012 77f77680 7411b608 fb343b7e fb34404a 7513ee50 Krnl Code: fb344374: a7210002 tmll%r2,2 fb344378: a7840004 brc 8,fb344380 fb34437c: a7f40001 brc 15,fb34437e >fb344380: 5810d8c2 l %r1,2242(%r13) fb344384: 5820b03c l %r2,60(%r11) fb344388: 0de1 basr%r14,%r1 fb34438a: 5810d90e l %r1,2318(%r13) fb34438e: 5820b03c l %r2,60(%r11) Looking at the code, this really seems related to dirty buffers, so your patch is the main suspect at the moment. if (!barrier) { /* If there was a write error in the journal - we can't commit * this transaction - it will be invalid and, if successful, * will just end up propagating the write error out to * the file system. */ if (likely(!retval && !reiserfs_is_journal_aborted (journal))) { if (buffer_dirty(jl->j_commit_bh)) 1117>BUG(); mark_buffer_dirty(jl->j_commit_bh) ; sync_dirty_buffer(jl->j_commit_bh) ; } } Christian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger [EMAIL PROTECTED] writes: Eric, Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger: Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: fs/buffer.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) Your patches look sane so far. I have applied both patches, and the problem seems gone. I will try to get these patches to our testers. As long as they dont find new problems: Our testers did only a short test, and then they were stopped by problems with reiserfs. At the moment I cannot say for sure if your patch caused this, but we got the following BUG Thanks. ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr storage. [ cut here ] kernel BUG at /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117! illegal operation: 0001 [#1] Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur CPU:3Not tainted Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88) Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 Krnl GPRS: 0002 7411b5c8 002b 7b04d000 0001 76d1de00 7513eec0 0003 0012 77f77680 7411b608 fb343b7e fb34404a 7513ee50 Krnl Code: fb344374: a7210002 tmll%r2,2 fb344378: a7840004 brc 8,fb344380 fb34437c: a7f40001 brc 15,fb34437e fb344380: 5810d8c2 l %r1,2242(%r13) fb344384: 5820b03c l %r2,60(%r11) fb344388: 0de1 basr%r14,%r1 fb34438a: 5810d90e l %r1,2318(%r13) fb34438e: 5820b03c l %r2,60(%r11) Looking at the code, this really seems related to dirty buffers, so your patch is the main suspect at the moment. Sounds reasonable. if (!barrier) { /* If there was a write error in the journal - we can't commit * this transaction - it will be invalid and, if successful, * will just end up propagating the write error out to * the file system. */ if (likely(!retval !reiserfs_is_journal_aborted (journal))) { if (buffer_dirty(jl-j_commit_bh)) 1117BUG(); mark_buffer_dirty(jl-j_commit_bh) ; sync_dirty_buffer(jl-j_commit_bh) ; } } Grr. I'm not certain how to call that. Given that I should also be able to trigger this case by writing to the block device through the buffer cache (to the write spot at the write moment) this feels like a reiserfs bug. Although I feel screaming about filesystems that go BUG instead of remount read-only At the same time I increasingly don't think we should allow user space to dirty or update our filesystem metadata buffer heads. That seems like asking for trouble. Did you have both of my changes applied? To init_page_buffer() and to the ramdisk_set_dirty_page? I'm guessing it was the change to ramdisk_set_dirty_page Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 11:57 -0600, Eric W. Biederman wrote: Christian Borntraeger [EMAIL PROTECTED] writes: Eric, Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger: Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: fs/buffer.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) Your patches look sane so far. I have applied both patches, and the problem seems gone. I will try to get these patches to our testers. As long as they dont find new problems: Our testers did only a short test, and then they were stopped by problems with reiserfs. At the moment I cannot say for sure if your patch caused this, but we got the following BUG Thanks. ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr storage. [ cut here ] kernel BUG at /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117! illegal operation: 0001 [#1] Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur CPU:3Not tainted Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88) Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 Krnl GPRS: 0002 7411b5c8 002b 7b04d000 0001 76d1de00 7513eec0 0003 0012 77f77680 7411b608 fb343b7e fb34404a 7513ee50 Krnl Code: fb344374: a7210002 tmll%r2,2 fb344378: a7840004 brc 8,fb344380 fb34437c: a7f40001 brc 15,fb34437e fb344380: 5810d8c2 l %r1,2242(%r13) fb344384: 5820b03c l %r2,60(%r11) fb344388: 0de1 basr%r14,%r1 fb34438a: 5810d90e l %r1,2318(%r13) fb34438e: 5820b03c l %r2,60(%r11) Looking at the code, this really seems related to dirty buffers, so your patch is the main suspect at the moment. Sounds reasonable. if (!barrier) { /* If there was a write error in the journal - we can't commit * this transaction - it will be invalid and, if successful, * will just end up propagating the write error out to * the file system. */ if (likely(!retval !reiserfs_is_journal_aborted (journal))) { if (buffer_dirty(jl-j_commit_bh)) 1117BUG(); mark_buffer_dirty(jl-j_commit_bh) ; sync_dirty_buffer(jl-j_commit_bh) ; } } Grr. I'm not certain how to call that. Given that I should also be able to trigger this case by writing to the block device through the buffer cache (to the write spot at the write moment) this feels like a reiserfs bug. Although I feel screaming about filesystems that go BUG instead of remount read-only In this case, the commit block isn't allowed to be dirty before reiserfs decides it is safe to write it. The journal code expects it is the only spot in the kernel setting buffer heads dirty, and it only does so after the rest of the log blocks are safely on disk. Given this is a ramdisk, the check can be ignored, but I'd rather not sprinkle if (ram_disk) into the FS code At the same time I increasingly don't think we should allow user space to dirty or update our filesystem metadata buffer heads. That seems like asking for trouble. Demanding trouble ;) -chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason [EMAIL PROTECTED] writes: In this case, the commit block isn't allowed to be dirty before reiserfs decides it is safe to write it. The journal code expects it is the only spot in the kernel setting buffer heads dirty, and it only does so after the rest of the log blocks are safely on disk. Ok. So the journal code here fundamentally depends on being able to control the order of the writes, and something else being able to set the buffer head dirty messes up that control. Given this is a ramdisk, the check can be ignored, but I'd rather not sprinkle if (ram_disk) into the FS code It makes no sense to implement a ramdisk in a way that requires this. At the same time I increasingly don't think we should allow user space to dirty or update our filesystem metadata buffer heads. That seems like asking for trouble. Demanding trouble ;) Looks like it. There are even comments in jbd about the same class of problems. Apparently dump and tune2fs on mounted filesystems have triggered some of these issues. The practical question is any of this trouble worth handling. Thinking about it. I don't believe anyone has ever intentionally built a filesystem tool that depends on being able to modify a file systems metadata buffer heads while the filesystem is running, and doing that would seem to be fragile as it would require a lot of cooperation between the tool and the filesystem about how the filesystem uses and implement things. Now I guess I need to see how difficult a patch would be to give filesystems magic inodes to keep their metadata buffer heads in. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 14:29 -0600, Eric W. Biederman wrote: Chris Mason [EMAIL PROTECTED] writes: In this case, the commit block isn't allowed to be dirty before reiserfs decides it is safe to write it. The journal code expects it is the only spot in the kernel setting buffer heads dirty, and it only does so after the rest of the log blocks are safely on disk. Ok. So the journal code here fundamentally depends on being able to control the order of the writes, and something else being able to set the buffer head dirty messes up that control. Right. At the same time I increasingly don't think we should allow user space to dirty or update our filesystem metadata buffer heads. That seems like asking for trouble. Demanding trouble ;) Looks like it. There are even comments in jbd about the same class of problems. Apparently dump and tune2fs on mounted filesystems have triggered some of these issues. The practical question is any of this trouble worth handling. Thinking about it. I don't believe anyone has ever intentionally built a filesystem tool that depends on being able to modify a file systems metadata buffer heads while the filesystem is running, and doing that would seem to be fragile as it would require a lot of cooperation between the tool and the filesystem about how the filesystem uses and implement things. That's right. For example, ext2 is doing directories in the page cache of the directory inode, so there's a cache alias between the block device page cache and the directory inode page cache. Now I guess I need to see how difficult a patch would be to give filesystems magic inodes to keep their metadata buffer heads in. Not hard, the block device inode is already a magic inode for metadata buffer heads. You could just make another one attached to the bdev. But, I don't think I fully understand the problem you're trying to solve? -chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason [EMAIL PROTECTED] writes: Thinking about it. I don't believe anyone has ever intentionally built a filesystem tool that depends on being able to modify a file systems metadata buffer heads while the filesystem is running, and doing that would seem to be fragile as it would require a lot of cooperation between the tool and the filesystem about how the filesystem uses and implement things. That's right. For example, ext2 is doing directories in the page cache of the directory inode, so there's a cache alias between the block device page cache and the directory inode page cache. Now I guess I need to see how difficult a patch would be to give filesystems magic inodes to keep their metadata buffer heads in. Not hard, the block device inode is already a magic inode for metadata buffer heads. You could just make another one attached to the bdev. But, I don't think I fully understand the problem you're trying to solve? So the start: When we write buffers from the buffer cache we clear buffer_dirty but not PageDirty So try_to_free_buffers() will mark any page with clean buffer_heads that is not clean itself clean. The ramdisk set pages dirty to keep them from being removed from the page cache, just like ramfs. Unfortunately when those dirty ramdisk pages get buffers on them and those buffers all go clean and we are trying to reclaim buffer_heads we drop those pages from the page cache. Ouch! We can fix the ramdisk by setting making certain that buffer_heads on ramdisk pages stay dirty as well. The problem is this breaks filesystems like reiserfs and ext3 that expect to be able to make buffer_heads clean sometimes. There are other ways to solve this for ramdisks, such as changing where ramdisks are stored. However fixing the ramdisks this way still leaves the general problem that there are other paths to the filesystem metadata buffers, and those other paths cause the code to be complicated and buggy. So I'm trying to see if we can untangle this Gordian knot, so the code because more easily maintainable. To make the buffer cache a helper library instead of require infrastructure, it looks like two things need to happen. - Move metadata buffer heads off block devices page cache entries, - Communicate the mappings of data pages to block device sectors in a generic way without buffer heads. How we ultimately fix the ramdisk tends to depend on how we untangle the buffer head problem. Right now the only simple solution is to suppress try_to_free_buffers, which is a bit ugly. We can also come up with a completely separate store for the pages in the buffer cache but if we wind up moving the metadata buffer heads anyway then that should not be necessary. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman: Did you have both of my changes applied? To init_page_buffer() and to the ramdisk_set_dirty_page? Yes, I removed my patch and applied both patches from you. Christian - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger [EMAIL PROTECTED] writes: Am Mittwoch, 17. Oktober 2007 schrieb Eric W. Biederman: Did you have both of my changes applied? To init_page_buffer() and to the ramdisk_set_dirty_page? Yes, I removed my patch and applied both patches from you. Thanks. Grr. Inconsistent rules on a core piece of infrastructure. It looks like that if there is any trivial/minimal fix it is based on your patch suppressing try_to_free_buffers. Ugh. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 15:30 -0600, Eric W. Biederman wrote: Chris Mason [EMAIL PROTECTED] writes: Thinking about it. I don't believe anyone has ever intentionally built a filesystem tool that depends on being able to modify a file systems metadata buffer heads while the filesystem is running, and doing that would seem to be fragile as it would require a lot of cooperation between the tool and the filesystem about how the filesystem uses and implement things. That's right. For example, ext2 is doing directories in the page cache of the directory inode, so there's a cache alias between the block device page cache and the directory inode page cache. Now I guess I need to see how difficult a patch would be to give filesystems magic inodes to keep their metadata buffer heads in. Not hard, the block device inode is already a magic inode for metadata buffer heads. You could just make another one attached to the bdev. But, I don't think I fully understand the problem you're trying to solve? So the start: When we write buffers from the buffer cache we clear buffer_dirty but not PageDirty So try_to_free_buffers() will mark any page with clean buffer_heads that is not clean itself clean. The ramdisk set pages dirty to keep them from being removed from the page cache, just like ramfs. So, the problem is using the Dirty bit to indicate pinned. You're completely right that our current setup of buffer heads and pages and filesystem metadata is complex and difficult. But, moving the buffer heads off of the page cache pages isn't going to make it any easier to use dirty as pinned, especially in the face of buffer_head users for file data pages. You've already seen Nick fsblock code, but you can see my general approach to replacing buffer heads here: http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h (alpha quality implementation in extent_map.c and users in inode.c) The basic idea is to do extent based record keeping for mapping and state of things in the filesystem, and to avoid attaching these things to the page. Unfortunately when those dirty ramdisk pages get buffers on them and those buffers all go clean and we are trying to reclaim buffer_heads we drop those pages from the page cache. Ouch! We can fix the ramdisk by setting making certain that buffer_heads on ramdisk pages stay dirty as well. The problem is this breaks filesystems like reiserfs and ext3 that expect to be able to make buffer_heads clean sometimes. There are other ways to solve this for ramdisks, such as changing where ramdisks are stored. However fixing the ramdisks this way still leaves the general problem that there are other paths to the filesystem metadata buffers, and those other paths cause the code to be complicated and buggy. So I'm trying to see if we can untangle this Gordian knot, so the code because more easily maintainable. Don't get me wrong, I'd love to see a simple and coherent fix for what reiserfs and ext3 do with buffer head state, but I think for the short term you're best off pinning the ramdisk pages via some other means. -chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason [EMAIL PROTECTED] writes: So, the problem is using the Dirty bit to indicate pinned. You're completely right that our current setup of buffer heads and pages and filesystpem metadata is complex and difficult. But, moving the buffer heads off of the page cache pages isn't going to make it any easier to use dirty as pinned, especially in the face of buffer_head users for file data pages. Let me specific. Not moving buffer_heads off of page cache pages, moving buffer_heads off of the block devices page cache pages. My problem is the coupling of how block devices are cached and the implementation of buffer heads, and by removing that coupling we can generally make things better. Currently that coupling means silly things like all block devices are cached in low memory. Which probably isn't what you want if you actually have a use for block devices. For the ramdisk case in particular what this means is that there are no more users that create buffer_head mappings on the block device cache so using the dirty bit will be safe. Further it removes the nasty possibility of user space messing with metadata buffer head state. So the only way those cases can happen is a code bug, or a hardware bug. So I think by removing these unnecessary code paths things will become easier to work with. You've already seen Nick fsblock code, but you can see my general approach to replacing buffer heads here: http://oss.oracle.com/mercurial/mason/btrfs-unstable/file/f89e7971692f/extent_map.h (alpha quality implementation in extent_map.c and users in inode.c) The basic idea is to do extent based record keeping for mapping and state of things in the filesystem, and to avoid attaching these things to the page. Interesting. Something to dig into. Don't get me wrong, I'd love to see a simple and coherent fix for what reiserfs and ext3 do with buffer head state, but I think for the short term you're best off pinning the ramdisk pages via some other means. Yes. And the problem is hard enough to trigger that a short term fix is actually of debatable value. The reason this hasn't shown up more frequently is that it only ever triggers if you are in the buffer head reclaim state, which on a 64bit box means you have to use 4K buffers and have your ram cache another block device. That plus most people use initramfs these days. For the short term we have Christian's other patch which simply disables calling try_to_free_buffers. Although that really feels like a hack to me. For 2.6.25 I think I have a shot at fixing these things cleanly. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote: Chris Mason [EMAIL PROTECTED] writes: So, the problem is using the Dirty bit to indicate pinned. You're completely right that our current setup of buffer heads and pages and filesystpem metadata is complex and difficult. But, moving the buffer heads off of the page cache pages isn't going to make it any easier to use dirty as pinned, especially in the face of buffer_head users for file data pages. Let me specific. Not moving buffer_heads off of page cache pages, moving buffer_heads off of the block devices page cache pages. My problem is the coupling of how block devices are cached and the implementation of buffer heads, and by removing that coupling we can generally make things better. Currently that coupling means silly things like all block devices are cached in low memory. Which probably isn't what you want if you actually have a use for block devices. For the ramdisk case in particular what this means is that there are no more users that create buffer_head mappings on the block device cache so using the dirty bit will be safe. Ok, we move the buffer heads off to a different inode, and that indoe has pages. The pages on the inode still need to get pinned, how does that pinning happen? The problem you described where someone cleans a page because the buffer heads are clean happens already without help from userland. So, keeping the pages away from userland won't save them from cleaning. Sorry if I'm reading your suggestion wrong... -chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Chris Mason [EMAIL PROTECTED] writes: On Wed, 2007-10-17 at 17:28 -0600, Eric W. Biederman wrote: Chris Mason [EMAIL PROTECTED] writes: So, the problem is using the Dirty bit to indicate pinned. You're completely right that our current setup of buffer heads and pages and filesystpem metadata is complex and difficult. But, moving the buffer heads off of the page cache pages isn't going to make it any easier to use dirty as pinned, especially in the face of buffer_head users for file data pages. Let me specific. Not moving buffer_heads off of page cache pages, moving buffer_heads off of the block devices page cache pages. My problem is the coupling of how block devices are cached and the implementation of buffer heads, and by removing that coupling we can generally make things better. Currently that coupling means silly things like all block devices are cached in low memory. Which probably isn't what you want if you actually have a use for block devices. For the ramdisk case in particular what this means is that there are no more users that create buffer_head mappings on the block device cache so using the dirty bit will be safe. Ok, we move the buffer heads off to a different inode, and that indoe has pages. The pages on the inode still need to get pinned, how does that pinning happen? The problem you described where someone cleans a page because the buffer heads are clean happens already without help from userland. So, keeping the pages away from userland won't save them from cleaning. Sorry if I'm reading your suggestion wrong... Yes. I was suggesting to continue to pin the pages for the page cache pages block device inode, and having the buffer cache live on some other inode. Thus not causing me problems by adding clean buffer_heads to my dirty pages. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Eric, Am Dienstag, 16. Oktober 2007 schrieb Christian Borntraeger: Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: fs/buffer.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) Your patches look sane so far. I have applied both patches, and the problem seems gone. I will try to get these patches to our testers. As long as they dont find new problems: Our testers did only a short test, and then they were stopped by problems with reiserfs. At the moment I cannot say for sure if your patch caused this, but we got the following BUG ReiserFS: ram0: warning: Created .reiserfs_priv on ram0 - reserved for xattr storage. [ cut here ] kernel BUG at /home/autobuild/BUILD/linux-2.6.23-20071017/fs/reiserfs/journal.c:1117! illegal operation: 0001 [#1] Modules linked in: reiserfs dm_multipath sunrpc dm_mod qeth ccwgroup vmur CPU:3Not tainted Process reiserfs/3 (pid: 2592, task: 77dac418, ksp: 7513ee88) Krnl PSW : 070c3000 fb344380 (flush_commit_list+0x808/0x95c [reiserfs]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 Krnl GPRS: 0002 7411b5c8 002b 7b04d000 0001 76d1de00 7513eec0 0003 0012 77f77680 7411b608 fb343b7e fb34404a 7513ee50 Krnl Code: fb344374: a7210002 tmll%r2,2 fb344378: a7840004 brc 8,fb344380 fb34437c: a7f40001 brc 15,fb34437e fb344380: 5810d8c2 l %r1,2242(%r13) fb344384: 5820b03c l %r2,60(%r11) fb344388: 0de1 basr%r14,%r1 fb34438a: 5810d90e l %r1,2318(%r13) fb34438e: 5820b03c l %r2,60(%r11) Looking at the code, this really seems related to dirty buffers, so your patch is the main suspect at the moment. if (!barrier) { /* If there was a write error in the journal - we can't commit * this transaction - it will be invalid and, if successful, * will just end up propagating the write error out to * the file system. */ if (likely(!retval !reiserfs_is_journal_aborted (journal))) { if (buffer_dirty(jl-j_commit_bh)) 1117BUG(); mark_buffer_dirty(jl-j_commit_bh) ; sync_dirty_buffer(jl-j_commit_bh) ; } } Christian - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wednesday 17 October 2007 05:06, Eric W. Biederman wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote: > >> I have not observed this case but it is possible to get a dirty page > >> cache with clean buffer heads if we get a clean ramdisk page with > >> buffer heads generated by a filesystem calling __getblk and then write > >> to that page from user space through the block device. Then we just > >> need to hit the proper window and try_to_free_buffers() will mark that > >> page clean and eventually drop it. Ouch! > >> > >> To fix this use the generic __set_page_dirty_buffers in the ramdisk > >> code so that when we mark a page dirty we also mark it's buffer heads > >> dirty. > > > > Hmm, so we can also have some filesystems writing their own buffers > > out by hand (clear_buffer_dirty, submit buffer for IO). Other places > > will do similarly dodgy things with filesystem metadata > > (fsync_buffers_list, for example). > > > > So your buffers get cleaned again, then your pages get cleaned. > > So I just took a little bit of time to look at and think about > the path you are referring to, and I don't see a problem. > > The rule with the buffer dirty bit is that you first clear it > and then you submit the write. When the write request finally > makes it's way to rd.c we copy the data if necessary and call > set_page_dirty. Which will then mark the page and the buffer > dirty again. Oh, maybe you're right. I didn't see it redirty the page there. > In essence the ramdisk code just attempts to lock buffers in > ram by setting their dirty bit, just like we do for pages > in ramfs. Yeah, which is half the reason why its so complicated. Logically it should just hold another reference on the pages rather than interfere with pagecache state, but it can't do that because it doesn't always know when a new page is inserted. > > While I said it was a good fix when I saw the patch earlier, I think > > it's not closing the entire hole, and as such, Christian's patch is > > probably the way to go for stable. > > I thought through the logic in try_to_free_buffers and it actually > makes sense to me now. mark_buffer_dirty sets the page dirty bit > so dirty buffers reside on dirty pages. When we submit I/O we > aren't guaranteed to submit all of the dirty buffers for a page > at once, so we don't clear the page dirty bit. With the result > that we can end up with pages with the dirty bit set but all of > their buffers are clean. Yep. > Since we rather deliberately allow truly clean pages to be dropped > from the ramdisk overriding try_to_free_buffer_pages looks wrong > because then except for invalidate we can not remove buffers > from truly clean pages. Yeah, if your fix works I guess it is better to use it and converge code rather than diverge it even more. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Nick Piggin <[EMAIL PROTECTED]> writes: > On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote: >> I have not observed this case but it is possible to get a dirty page >> cache with clean buffer heads if we get a clean ramdisk page with >> buffer heads generated by a filesystem calling __getblk and then write >> to that page from user space through the block device. Then we just >> need to hit the proper window and try_to_free_buffers() will mark that >> page clean and eventually drop it. Ouch! >> >> To fix this use the generic __set_page_dirty_buffers in the ramdisk >> code so that when we mark a page dirty we also mark it's buffer heads >> dirty. > > Hmm, so we can also have some filesystems writing their own buffers > out by hand (clear_buffer_dirty, submit buffer for IO). Other places > will do similarly dodgy things with filesystem metadata > (fsync_buffers_list, for example). > > So your buffers get cleaned again, then your pages get cleaned. So I just took a little bit of time to look at and think about the path you are referring to, and I don't see a problem. The rule with the buffer dirty bit is that you first clear it and then you submit the write. When the write request finally makes it's way to rd.c we copy the data if necessary and call set_page_dirty. Which will then mark the page and the buffer dirty again. In essence the ramdisk code just attempts to lock buffers in ram by setting their dirty bit, just like we do for pages in ramfs. The only case where I see the dirty bit getting cleared without submitting I/O is when dirty state doesn't matter, in which case if we get a page full buffers all of whose data we don't care about it is legitimate to drop the page. As for ramdisk_writepage it probably makes sense to remove it, as the generic code paths seem to work as well or better the writepage method is NULL. However if we do keep it we should call set_page_dirty there on general principles. > While I said it was a good fix when I saw the patch earlier, I think > it's not closing the entire hole, and as such, Christian's patch is > probably the way to go for stable. I thought through the logic in try_to_free_buffers and it actually makes sense to me now. mark_buffer_dirty sets the page dirty bit so dirty buffers reside on dirty pages. When we submit I/O we aren't guaranteed to submit all of the dirty buffers for a page at once, so we don't clear the page dirty bit. With the result that we can end up with pages with the dirty bit set but all of their buffers are clean. Since we rather deliberately allow truly clean pages to be dropped from the ramdisk overriding try_to_free_buffer_pages looks wrong because then except for invalidate we can not remove buffers from truly clean pages. > For mainline, *if* we want to keep the old rd.c around at all, I > don't see any harm in this patch so long as Christian's is merged > as well. Sharing common code is always good. I do agree that with the amount of code duplication in the buffer cache that locking pages into the buffer cache seems very error prone, and difficult to maintain. So rewriting rd.c to store it's pages elsewhere sounds very promising. While I can see Christian's patch as fixing the symptom. I have a very hard time see it as correct. If we did something more elaborate to replace try_to_free_buffer_pages such that we could drop buffers from clean buffer cache pages when they became such and so were only suppressing the drop the dirty bit logic for pages that contain data we want to keep I would be ok with it.Just totally skipping buffer head freeing just feels wrong to me. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger <[EMAIL PROTECTED]> writes: > Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: > >> fs/buffer.c | 3 +++ >> 1 files changed, 3 insertions(+), 0 deletions(-) >> drivers/block/rd.c | 13 + >> 1 files changed, 1 insertions(+), 12 deletions(-) > > Your patches look sane so far. I have applied both patches, and the problem > seems gone. I will try to get these patches to our testers. > > As long as they dont find new problems: Sounds good. Please make certain to test reiserfs as well as ext2+ as it seems to strain the ramdisk code more aggressively. > Acked-by: Christian Borntraeger <[EMAIL PROTECTED]> > > Nick, Eric. What is the best patch for the stable series? Both patches from > Eric or mine? I CCed stable, so they know that something is coming. My gut feel says my patches assuming everything tests out ok, as they actually fix the problem instead of papering over it, and there isn't really a size difference. In addition the change to init_page_buffers is interesting all by itself. With that patch we now have the option of running block devices in mode where we don't bother to cache the buffer heads which should reduce memory pressure a bit. Not that an enhancement like that will show up in stable, but... Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Dienstag, 16. Oktober 2007 schrieb Nick Piggin: > While I said it was a good fix when I saw the patch earlier, I think > it's not closing the entire hole, and as such, Christian's patch is > probably the way to go for stable. That would be my preferred method. Merge Erics and my fixup for 2.6.24-rc. The only open questions is, what was the reiserfs problem? Is it still causes by Erics patches? > For mainline, *if* we want to keep the old rd.c around at all, I > don't see any harm in this patch so long as Christian's is merged > as well. Sharing common code is always good. While the merge window is still open, to me the new ramdisk code seems more like a 2.6.25-rc thing. We actually should test the behaviour in low memory scenarios. What do you think? Christian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty
hi!to everybody i need sex video -- This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty
hi! i need sex video -- This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote: > I have not observed this case but it is possible to get a dirty page > cache with clean buffer heads if we get a clean ramdisk page with > buffer heads generated by a filesystem calling __getblk and then write > to that page from user space through the block device. Then we just > need to hit the proper window and try_to_free_buffers() will mark that > page clean and eventually drop it. Ouch! > > To fix this use the generic __set_page_dirty_buffers in the ramdisk > code so that when we mark a page dirty we also mark it's buffer heads > dirty. Hmm, so we can also have some filesystems writing their own buffers out by hand (clear_buffer_dirty, submit buffer for IO). Other places will do similarly dodgy things with filesystem metadata (fsync_buffers_list, for example). So your buffers get cleaned again, then your pages get cleaned. While I said it was a good fix when I saw the patch earlier, I think it's not closing the entire hole, and as such, Christian's patch is probably the way to go for stable. For mainline, *if* we want to keep the old rd.c around at all, I don't see any harm in this patch so long as Christian's is merged as well. Sharing common code is always good. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > --- > drivers/block/rd.c | 13 + > 1 files changed, 1 insertions(+), 12 deletions(-) > > diff --git a/drivers/block/rd.c b/drivers/block/rd.c > index 701ea77..84163da 100644 > --- a/drivers/block/rd.c > +++ b/drivers/block/rd.c > @@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space > *mapping, return 0; > } > > -/* > - * ramdisk blockdev pages have their own ->set_page_dirty() because we > don't - * want them to contribute to dirty memory accounting. > - */ > -static int ramdisk_set_page_dirty(struct page *page) > -{ > - if (!TestSetPageDirty(page)) > - return 1; > - return 0; > -} > - > static const struct address_space_operations ramdisk_aops = { > .readpage = ramdisk_readpage, > .prepare_write = ramdisk_prepare_write, > .commit_write = ramdisk_commit_write, > .writepage = ramdisk_writepage, > - .set_page_dirty = ramdisk_set_page_dirty, > + .set_page_dirty = __set_page_dirty_buffers, > .writepages = ramdisk_writepages, > }; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: > fs/buffer.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > drivers/block/rd.c | 13 + > 1 files changed, 1 insertions(+), 12 deletions(-) Your patches look sane so far. I have applied both patches, and the problem seems gone. I will try to get these patches to our testers. As long as they dont find new problems: Acked-by: Christian Borntraeger <[EMAIL PROTECTED]> Nick, Eric. What is the best patch for the stable series? Both patches from Eric or mine? I CCed stable, so they know that something is coming. Christian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: fs/buffer.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) Your patches look sane so far. I have applied both patches, and the problem seems gone. I will try to get these patches to our testers. As long as they dont find new problems: Acked-by: Christian Borntraeger [EMAIL PROTECTED] Nick, Eric. What is the best patch for the stable series? Both patches from Eric or mine? I CCed stable, so they know that something is coming. Christian - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty
hi! i need sex video -- This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote: I have not observed this case but it is possible to get a dirty page cache with clean buffer heads if we get a clean ramdisk page with buffer heads generated by a filesystem calling __getblk and then write to that page from user space through the block device. Then we just need to hit the proper window and try_to_free_buffers() will mark that page clean and eventually drop it. Ouch! To fix this use the generic __set_page_dirty_buffers in the ramdisk code so that when we mark a page dirty we also mark it's buffer heads dirty. Hmm, so we can also have some filesystems writing their own buffers out by hand (clear_buffer_dirty, submit buffer for IO). Other places will do similarly dodgy things with filesystem metadata (fsync_buffers_list, for example). So your buffers get cleaned again, then your pages get cleaned. While I said it was a good fix when I saw the patch earlier, I think it's not closing the entire hole, and as such, Christian's patch is probably the way to go for stable. For mainline, *if* we want to keep the old rd.c around at all, I don't see any harm in this patch so long as Christian's is merged as well. Sharing common code is always good. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) diff --git a/drivers/block/rd.c b/drivers/block/rd.c index 701ea77..84163da 100644 --- a/drivers/block/rd.c +++ b/drivers/block/rd.c @@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space *mapping, return 0; } -/* - * ramdisk blockdev pages have their own -set_page_dirty() because we don't - * want them to contribute to dirty memory accounting. - */ -static int ramdisk_set_page_dirty(struct page *page) -{ - if (!TestSetPageDirty(page)) - return 1; - return 0; -} - static const struct address_space_operations ramdisk_aops = { .readpage = ramdisk_readpage, .prepare_write = ramdisk_prepare_write, .commit_write = ramdisk_commit_write, .writepage = ramdisk_writepage, - .set_page_dirty = ramdisk_set_page_dirty, + .set_page_dirty = __set_page_dirty_buffers, .writepages = ramdisk_writepages, }; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] rd: Mark ramdisk buffers heads dirty
hi!to everybody i need sex video -- This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/7788114.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Am Dienstag, 16. Oktober 2007 schrieb Nick Piggin: While I said it was a good fix when I saw the patch earlier, I think it's not closing the entire hole, and as such, Christian's patch is probably the way to go for stable. That would be my preferred method. Merge Erics and my fixup for 2.6.24-rc. The only open questions is, what was the reiserfs problem? Is it still causes by Erics patches? For mainline, *if* we want to keep the old rd.c around at all, I don't see any harm in this patch so long as Christian's is merged as well. Sharing common code is always good. While the merge window is still open, to me the new ramdisk code seems more like a 2.6.25-rc thing. We actually should test the behaviour in low memory scenarios. What do you think? Christian - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Christian Borntraeger [EMAIL PROTECTED] writes: Am Dienstag, 16. Oktober 2007 schrieb Eric W. Biederman: fs/buffer.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) Your patches look sane so far. I have applied both patches, and the problem seems gone. I will try to get these patches to our testers. As long as they dont find new problems: Sounds good. Please make certain to test reiserfs as well as ext2+ as it seems to strain the ramdisk code more aggressively. Acked-by: Christian Borntraeger [EMAIL PROTECTED] Nick, Eric. What is the best patch for the stable series? Both patches from Eric or mine? I CCed stable, so they know that something is coming. My gut feel says my patches assuming everything tests out ok, as they actually fix the problem instead of papering over it, and there isn't really a size difference. In addition the change to init_page_buffers is interesting all by itself. With that patch we now have the option of running block devices in mode where we don't bother to cache the buffer heads which should reduce memory pressure a bit. Not that an enhancement like that will show up in stable, but... Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
Nick Piggin [EMAIL PROTECTED] writes: On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote: I have not observed this case but it is possible to get a dirty page cache with clean buffer heads if we get a clean ramdisk page with buffer heads generated by a filesystem calling __getblk and then write to that page from user space through the block device. Then we just need to hit the proper window and try_to_free_buffers() will mark that page clean and eventually drop it. Ouch! To fix this use the generic __set_page_dirty_buffers in the ramdisk code so that when we mark a page dirty we also mark it's buffer heads dirty. Hmm, so we can also have some filesystems writing their own buffers out by hand (clear_buffer_dirty, submit buffer for IO). Other places will do similarly dodgy things with filesystem metadata (fsync_buffers_list, for example). So your buffers get cleaned again, then your pages get cleaned. So I just took a little bit of time to look at and think about the path you are referring to, and I don't see a problem. The rule with the buffer dirty bit is that you first clear it and then you submit the write. When the write request finally makes it's way to rd.c we copy the data if necessary and call set_page_dirty. Which will then mark the page and the buffer dirty again. In essence the ramdisk code just attempts to lock buffers in ram by setting their dirty bit, just like we do for pages in ramfs. The only case where I see the dirty bit getting cleared without submitting I/O is when dirty state doesn't matter, in which case if we get a page full buffers all of whose data we don't care about it is legitimate to drop the page. As for ramdisk_writepage it probably makes sense to remove it, as the generic code paths seem to work as well or better the writepage method is NULL. However if we do keep it we should call set_page_dirty there on general principles. While I said it was a good fix when I saw the patch earlier, I think it's not closing the entire hole, and as such, Christian's patch is probably the way to go for stable. I thought through the logic in try_to_free_buffers and it actually makes sense to me now. mark_buffer_dirty sets the page dirty bit so dirty buffers reside on dirty pages. When we submit I/O we aren't guaranteed to submit all of the dirty buffers for a page at once, so we don't clear the page dirty bit. With the result that we can end up with pages with the dirty bit set but all of their buffers are clean. Since we rather deliberately allow truly clean pages to be dropped from the ramdisk overriding try_to_free_buffer_pages looks wrong because then except for invalidate we can not remove buffers from truly clean pages. For mainline, *if* we want to keep the old rd.c around at all, I don't see any harm in this patch so long as Christian's is merged as well. Sharing common code is always good. I do agree that with the amount of code duplication in the buffer cache that locking pages into the buffer cache seems very error prone, and difficult to maintain. So rewriting rd.c to store it's pages elsewhere sounds very promising. While I can see Christian's patch as fixing the symptom. I have a very hard time see it as correct. If we did something more elaborate to replace try_to_free_buffer_pages such that we could drop buffers from clean buffer cache pages when they became such and so were only suppressing the drop the dirty bit logic for pages that contain data we want to keep I would be ok with it.Just totally skipping buffer head freeing just feels wrong to me. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rd: Mark ramdisk buffers heads dirty
On Wednesday 17 October 2007 05:06, Eric W. Biederman wrote: Nick Piggin [EMAIL PROTECTED] writes: On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote: I have not observed this case but it is possible to get a dirty page cache with clean buffer heads if we get a clean ramdisk page with buffer heads generated by a filesystem calling __getblk and then write to that page from user space through the block device. Then we just need to hit the proper window and try_to_free_buffers() will mark that page clean and eventually drop it. Ouch! To fix this use the generic __set_page_dirty_buffers in the ramdisk code so that when we mark a page dirty we also mark it's buffer heads dirty. Hmm, so we can also have some filesystems writing their own buffers out by hand (clear_buffer_dirty, submit buffer for IO). Other places will do similarly dodgy things with filesystem metadata (fsync_buffers_list, for example). So your buffers get cleaned again, then your pages get cleaned. So I just took a little bit of time to look at and think about the path you are referring to, and I don't see a problem. The rule with the buffer dirty bit is that you first clear it and then you submit the write. When the write request finally makes it's way to rd.c we copy the data if necessary and call set_page_dirty. Which will then mark the page and the buffer dirty again. Oh, maybe you're right. I didn't see it redirty the page there. In essence the ramdisk code just attempts to lock buffers in ram by setting their dirty bit, just like we do for pages in ramfs. Yeah, which is half the reason why its so complicated. Logically it should just hold another reference on the pages rather than interfere with pagecache state, but it can't do that because it doesn't always know when a new page is inserted. While I said it was a good fix when I saw the patch earlier, I think it's not closing the entire hole, and as such, Christian's patch is probably the way to go for stable. I thought through the logic in try_to_free_buffers and it actually makes sense to me now. mark_buffer_dirty sets the page dirty bit so dirty buffers reside on dirty pages. When we submit I/O we aren't guaranteed to submit all of the dirty buffers for a page at once, so we don't clear the page dirty bit. With the result that we can end up with pages with the dirty bit set but all of their buffers are clean. Yep. Since we rather deliberately allow truly clean pages to be dropped from the ramdisk overriding try_to_free_buffer_pages looks wrong because then except for invalidate we can not remove buffers from truly clean pages. Yeah, if your fix works I guess it is better to use it and converge code rather than diverge it even more. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rd: Mark ramdisk buffers heads dirty
I have not observed this case but it is possible to get a dirty page cache with clean buffer heads if we get a clean ramdisk page with buffer heads generated by a filesystem calling __getblk and then write to that page from user space through the block device. Then we just need to hit the proper window and try_to_free_buffers() will mark that page clean and eventually drop it. Ouch! To fix this use the generic __set_page_dirty_buffers in the ramdisk code so that when we mark a page dirty we also mark it's buffer heads dirty. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) diff --git a/drivers/block/rd.c b/drivers/block/rd.c index 701ea77..84163da 100644 --- a/drivers/block/rd.c +++ b/drivers/block/rd.c @@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space *mapping, return 0; } -/* - * ramdisk blockdev pages have their own ->set_page_dirty() because we don't - * want them to contribute to dirty memory accounting. - */ -static int ramdisk_set_page_dirty(struct page *page) -{ - if (!TestSetPageDirty(page)) - return 1; - return 0; -} - static const struct address_space_operations ramdisk_aops = { .readpage = ramdisk_readpage, .prepare_write = ramdisk_prepare_write, .commit_write = ramdisk_commit_write, .writepage = ramdisk_writepage, - .set_page_dirty = ramdisk_set_page_dirty, + .set_page_dirty = __set_page_dirty_buffers, .writepages = ramdisk_writepages, }; -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rd: Mark ramdisk buffers heads dirty
I have not observed this case but it is possible to get a dirty page cache with clean buffer heads if we get a clean ramdisk page with buffer heads generated by a filesystem calling __getblk and then write to that page from user space through the block device. Then we just need to hit the proper window and try_to_free_buffers() will mark that page clean and eventually drop it. Ouch! To fix this use the generic __set_page_dirty_buffers in the ramdisk code so that when we mark a page dirty we also mark it's buffer heads dirty. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- drivers/block/rd.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) diff --git a/drivers/block/rd.c b/drivers/block/rd.c index 701ea77..84163da 100644 --- a/drivers/block/rd.c +++ b/drivers/block/rd.c @@ -178,23 +178,12 @@ static int ramdisk_writepages(struct address_space *mapping, return 0; } -/* - * ramdisk blockdev pages have their own -set_page_dirty() because we don't - * want them to contribute to dirty memory accounting. - */ -static int ramdisk_set_page_dirty(struct page *page) -{ - if (!TestSetPageDirty(page)) - return 1; - return 0; -} - static const struct address_space_operations ramdisk_aops = { .readpage = ramdisk_readpage, .prepare_write = ramdisk_prepare_write, .commit_write = ramdisk_commit_write, .writepage = ramdisk_writepage, - .set_page_dirty = ramdisk_set_page_dirty, + .set_page_dirty = __set_page_dirty_buffers, .writepages = ramdisk_writepages, }; -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/