Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton
On Sun, 7 Jan 2007 12:36:18 +1030 "Tom Lanyon" <[EMAIL PROTECTED]> wrote: > On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > What would also actually be interesting is whether somebody can reproduce > > this on Reiserfs, for example. I _think_ all the reports I've seen are on > > ext2

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 1/7/07, Tom Lanyon <[EMAIL PROTECTED]> wrote: I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far... -- Tom Lanyon - To

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 1/7/07, Tom Lanyon [EMAIL PROTECTED] wrote: I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far... -- Tom Lanyon - To

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton
On Sun, 7 Jan 2007 12:36:18 +1030 Tom Lanyon [EMAIL PROTECTED] wrote: On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Dave Jones
On Fri, Dec 29, 2006 at 07:52:15PM +0100, maximilian attems wrote: > > The only -mm stuff I recall being in the Fedora 2.6.18 is > > the inode-diet stuff which ended up in 2.6.19, though the xmas > > break has left my head somewhat empty so I may be forgetting something. > > What patch in

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread maximilian attems
On Fri, Dec 29, 2006 at 10:02:53AM -0500, Dave Jones wrote: > On Fri, Dec 29, 2006 at 10:23:14AM +0100, maximilian attems wrote: > > > On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: > > > > > That was a Fedora kernel. Has anyone seen the corruption in vanilla > 2.6.18 > > >

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Guillaume Chazarain
Linus Torvalds a écrit : going back to Linux-2.6.5 at least, according to one tester). I apologize for the confusion, but it just occurred to me that I was actually experiencing a totally different problem: I set a root filesystem of 3Mib for qemu, so the test program just didn't have

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Dave Jones
On Fri, Dec 29, 2006 at 10:23:14AM +0100, maximilian attems wrote: > > On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: > > > > > > > > > On Thu, 28 Dec 2006, Petri Kaukasoina wrote: > > > > > me up), and that seems to show the corruption going way way back > > (ie

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread maximilian attems
> On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: > > > > > > On Thu, 28 Dec 2006, Petri Kaukasoina wrote: > > > > me up), and that seems to show the corruption going way way back (ie > going > > > > back to Linux-2.6.5 at least, according to one tester). > > > > > >

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread maximilian attems
On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: On Thu, 28 Dec 2006, Petri Kaukasoina wrote: me up), and that seems to show the corruption going way way back (ie going back to Linux-2.6.5 at least, according to one tester). That was a Fedora

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Dave Jones
On Fri, Dec 29, 2006 at 10:23:14AM +0100, maximilian attems wrote: On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: On Thu, 28 Dec 2006, Petri Kaukasoina wrote: me up), and that seems to show the corruption going way way back (ie going back to

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Guillaume Chazarain
Linus Torvalds a écrit : going back to Linux-2.6.5 at least, according to one tester). I apologize for the confusion, but it just occurred to me that I was actually experiencing a totally different problem: I set a root filesystem of 3Mib for qemu, so the test program just didn't have

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread maximilian attems
On Fri, Dec 29, 2006 at 10:02:53AM -0500, Dave Jones wrote: On Fri, Dec 29, 2006 at 10:23:14AM +0100, maximilian attems wrote: On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: snipp That was a Fedora kernel. Has anyone seen the corruption in vanilla 2.6.18 (or

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Dave Jones
On Fri, Dec 29, 2006 at 07:52:15PM +0100, maximilian attems wrote: The only -mm stuff I recall being in the Fedora 2.6.18 is the inode-diet stuff which ended up in 2.6.19, though the xmas break has left my head somewhat empty so I may be forgetting something. What patch in particular

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Andrew Morton
On Thu, 28 Dec 2006 17:38:38 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > in > the hope that somebody else is working on this corruption issue and is > interested.. What corruption issue? ;) I'm finding that the corruption happens trivially with your test app, but apparently

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
Btw, much cleaned-up page tracing patch here, in case anybody cares (and "test.c" attached, although I don't think it changed since last time). The test.c output is a bit hard to read at times, since it will give offsets in bytes as hex (ie "00a77664" means page frame 0a77, and byte

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Anton Altaparmakov
On Thu, 28 Dec 2006, Linus Torvalds wrote: > Ok, > with the ugly trace capture patch, I've actually captured this corruption > in action, I think. > > I did a full trace of all pages involved in one run, and picked one > corruption at random: > > Chunk 14465 corrupted (0-75)

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Anton Altaparmakov wrote: > > But are chunks 3 and 4 in separate buffer heads? Sorry could not see it > immediately from the output you showed... No, this is a 4kB filesystem. A single bh per page. > It is just that there may be a different cause rather than buffer

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, David Miller wrote: > > What happens when we writeback, to the PTEs? Not a damn thing. We clear the PTE's _before_ we even start the write. The writeback does nothing to them. If the user dirties the page while writeback is in progress, we'll take the page fault and

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread David Miller
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Thu, 28 Dec 2006 14:37:37 -0800 (PST) > So if we're not losing any dirty bits, what's going on? What happens when we writeback, to the PTEs? page_mkclean_file() iterates the VMAs and when it finds a shared one it goes: entry =

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
Ok, with the ugly trace capture patch, I've actually captured this corruption in action, I think. I did a full trace of all pages involved in one run, and picked one corruption at random: Chunk 14465 corrupted (0-75) (01423fb4-01423fff) Expected 129, got 0 Written as

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Russell King
On Thu, Dec 28, 2006 at 01:24:30PM -0800, Linus Torvalds wrote: > On Thu, 28 Dec 2006, Linus Torvalds wrote: > > > > What we need now is actually looking at the source code, and people who > > understand the VM, I'm afraid. I'm gathering traces now that I have a good > > test-case. I'll post my

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Linus Torvalds wrote: > > What we need now is actually looking at the source code, and people who > understand the VM, I'm afraid. I'm gathering traces now that I have a good > test-case. I'll post my trace tools once I've tested that they work, in > case others want to

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Arjan van de Ven
On Thu, 2006-12-28 at 14:39 -0500, Dave Jones wrote: > On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: > > > > > > On Thu, 28 Dec 2006, Petri Kaukasoina wrote: > > > > me up), and that seems to show the corruption going way way back (ie > going > > > > back to Linux-2.6.5

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Dave Jones
On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: > > > On Thu, 28 Dec 2006, Petri Kaukasoina wrote: > > > me up), and that seems to show the corruption going way way back (ie > > > going > > > back to Linux-2.6.5 at least, according to one tester). > > > > That was a

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Petri Kaukasoina wrote: > > me up), and that seems to show the corruption going way way back (ie going > > back to Linux-2.6.5 at least, according to one tester). > > That was a Fedora kernel. Has anyone seen the corruption in vanilla 2.6.18 > (or older)? Well, that was a

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Petri Kaukasoina
On Thu, Dec 28, 2006 at 11:00:46AM -0800, Linus Torvalds wrote: > And I have a test-program that shows the corruption _much_ easier (at > least according to my own testing, and that of several reporters that back > me up), and that seems to show the corruption going way way back (ie going >

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Marc Haber wrote: > > After being up for ten days, I have now encountered the file > corruption of pkgcache.bin for the first time again. The 256 MB i386 > box is like 26M in swap, is under very moderate load. > > I am running plain vanilla 2.6.19.1. Is there a patch that

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Marc Haber
On Tue, Dec 19, 2006 at 09:51:49AM +0100, Marc Haber wrote: > On Sun, Dec 17, 2006 at 09:43:08PM -0800, Andrew Morton wrote: > > Six hours here of fsx-linux plus high memory pressure on SMP on 1k > > blocksize ext3, mainline. Zero failures. It's unlikely that this testing > > would pass, yet

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
Btw, much cleaned-up page tracing patch here, in case anybody cares (and test.c attached, although I don't think it changed since last time). The test.c output is a bit hard to read at times, since it will give offsets in bytes as hex (ie 00a77664 means page frame 0a77, and byte 664h

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Andrew Morton
On Thu, 28 Dec 2006 17:38:38 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: in the hope that somebody else is working on this corruption issue and is interested.. What corruption issue? ;) I'm finding that the corruption happens trivially with your test app, but apparently doesn't

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Marc Haber
On Tue, Dec 19, 2006 at 09:51:49AM +0100, Marc Haber wrote: On Sun, Dec 17, 2006 at 09:43:08PM -0800, Andrew Morton wrote: Six hours here of fsx-linux plus high memory pressure on SMP on 1k blocksize ext3, mainline. Zero failures. It's unlikely that this testing would pass, yet people

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Marc Haber wrote: After being up for ten days, I have now encountered the file corruption of pkgcache.bin for the first time again. The 256 MB i386 box is like 26M in swap, is under very moderate load. I am running plain vanilla 2.6.19.1. Is there a patch that I

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Petri Kaukasoina
On Thu, Dec 28, 2006 at 11:00:46AM -0800, Linus Torvalds wrote: And I have a test-program that shows the corruption _much_ easier (at least according to my own testing, and that of several reporters that back me up), and that seems to show the corruption going way way back (ie going back to

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Petri Kaukasoina wrote: me up), and that seems to show the corruption going way way back (ie going back to Linux-2.6.5 at least, according to one tester). That was a Fedora kernel. Has anyone seen the corruption in vanilla 2.6.18 (or older)? Well, that was a really

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Dave Jones
On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: On Thu, 28 Dec 2006, Petri Kaukasoina wrote: me up), and that seems to show the corruption going way way back (ie going back to Linux-2.6.5 at least, according to one tester). That was a Fedora kernel. Has

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Arjan van de Ven
On Thu, 2006-12-28 at 14:39 -0500, Dave Jones wrote: On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote: On Thu, 28 Dec 2006, Petri Kaukasoina wrote: me up), and that seems to show the corruption going way way back (ie going back to Linux-2.6.5 at least,

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Linus Torvalds wrote: What we need now is actually looking at the source code, and people who understand the VM, I'm afraid. I'm gathering traces now that I have a good test-case. I'll post my trace tools once I've tested that they work, in case others want to help.

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Russell King
On Thu, Dec 28, 2006 at 01:24:30PM -0800, Linus Torvalds wrote: On Thu, 28 Dec 2006, Linus Torvalds wrote: What we need now is actually looking at the source code, and people who understand the VM, I'm afraid. I'm gathering traces now that I have a good test-case. I'll post my trace

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
Ok, with the ugly trace capture patch, I've actually captured this corruption in action, I think. I did a full trace of all pages involved in one run, and picked one corruption at random: Chunk 14465 corrupted (0-75) (01423fb4-01423fff) Expected 129, got 0 Written as

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread David Miller
From: Linus Torvalds [EMAIL PROTECTED] Date: Thu, 28 Dec 2006 14:37:37 -0800 (PST) So if we're not losing any dirty bits, what's going on? What happens when we writeback, to the PTEs? page_mkclean_file() iterates the VMAs and when it finds a shared one it goes: entry =

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, David Miller wrote: What happens when we writeback, to the PTEs? Not a damn thing. We clear the PTE's _before_ we even start the write. The writeback does nothing to them. If the user dirties the page while writeback is in progress, we'll take the page fault and

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds
On Thu, 28 Dec 2006, Anton Altaparmakov wrote: But are chunks 3 and 4 in separate buffer heads? Sorry could not see it immediately from the output you showed... No, this is a 4kB filesystem. A single bh per page. It is just that there may be a different cause rather than buffer dirty

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Anton Altaparmakov
On Thu, 28 Dec 2006, Linus Torvalds wrote: Ok, with the ugly trace capture patch, I've actually captured this corruption in action, I think. I did a full trace of all pages involved in one run, and picked one corruption at random: Chunk 14465 corrupted (0-75)

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds
On Thu, 28 Dec 2006, Martin Schwidefsky wrote: > > For s390 there are two aspects to consider: > 1) the pte values are 100% software controlled. That's fine. In that situation, you shouldn't need any atomic ops at all, I think all our sw page-table operations are already done under the pte

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky
On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote: > What do you guys think? Does something like this work out for S/390 too? I > tried to make that "ptep_flush_dirty()" concept work for architectures > that hide the dirty bit somewhere else too, but.. For s390 there are two aspects to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. I've had reports of corrupted data on earlier kernel releases with reiserfs3, which

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn
On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote: > What would also actually be interesting is whether somebody can reproduce > this on Reiserfs, for example. I _think_ all the reports I've seen are on > ext2 or ext3, and if this is somehow writeback-related, it could be some >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: - It never uses mprotect on the shared mappings, but it _does_ do: "mincore()" - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: snip - It never uses mprotect on the shared mappings, but it _does_ do: mincore() - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn
On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. I've had reports of corrupted data on earlier kernel releases with reiserfs3, which

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky
On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote: What do you guys think? Does something like this work out for S/390 too? I tried to make that ptep_flush_dirty() concept work for architectures that hide the dirty bit somewhere else too, but.. For s390 there are two aspects to consider:

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds
On Thu, 28 Dec 2006, Martin Schwidefsky wrote: For s390 there are two aspects to consider: 1) the pte values are 100% software controlled. That's fine. In that situation, you shouldn't need any atomic ops at all, I think all our sw page-table operations are already done under the pte

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds
On Tue, 26 Dec 2006, Nick Piggin wrote: > Linus Torvalds wrote: > > > > Ok, so how about this diff. > > > > I'm actually feeling good about this one. It really looks like > > "do_no_page()" was simply buggy, and that this explains everything. > > Still trying to catch up here, so I'm not

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote: > On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > > > Hash check on download completion found bad chunks, consider using > > > "safe_sync". > > > > Dang. Did you

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > > Linus BTW,

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin
Linus Torvalds wrote: On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin
Linus Torvalds wrote: On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? Linus BTW, rmap.c patch is

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote: On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds
On Tue, 26 Dec 2006, Nick Piggin wrote: Linus Torvalds wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like do_no_page() was simply buggy, and that this explains everything. Still trying to catch up here, so I'm not going to reply to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-24 11:35]: > And if this doesn't fix it, I don't know what will.. Sorry, but it still fails (on top of plain 2.6.19). -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Michael S. Tsirkin
> Quoting Linus Torvalds <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content > corruption on ext3) > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > -

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like "do_no_page()" was simply buggy, and that this explains everything. I tested with just this patch and 2.6.19 and no change. Sorry Linus, no early

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > only these: ACPI: EC: evaluating _Q80

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: > > Hash check on download completion found bad chunks, consider using > "safe_sync". Dang. Did you get any warning messages from the kernel? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > > this patch and 2.6.19. > > Yeah, if my guess about do_no_page() is right, _none_ of the previous > patches

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > - writable > - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > How about this particularly stupid diff? (please test with something that > _would_ cause corruption normally). Actually, here's an even more stupid diff, which actually to some degree seems to capture the real problem better. Peter, tell me

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > > > I don't have corruption. I tested twice. > > > > This is a surprising result. Can you

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 14:14:38 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > - mount the fs with ext2 with the no-buffer-head option. That means > > > either: > > > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Andrew Morton <[EMAIL PROTECTED]> [2006-12-24 00:57]: > /etc/fstab: ext2 nobh > /etc/fstab: ext3 data=writeback,nobh It seems that busybox mount ignores the nobh option but both ext2 and ext3 data=writeback work for me. This is with plain 2.6.19 which normally always fails. -- Martin

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > /etc/fstab: ext2 nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext2

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:26:01 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > I also tested with ext3 ordered, nobh and I have file corruption... ordered+nobh isn't a possible combination. The filesystem probably ignored nobh. nobh mode only makes sense with data=writeback. - To unsubscribe

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: > On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > I now _suspect_ that we're talking about something like > > > > > > - we started

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrew Morton wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked clean and is now in the "writeback" phase. > > - a write happens to the page, and the page gets

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > I now _suspect_ that we're talking about something like > > - we started a writeout. The IO is still pending, and the page was >marked clean and is now in the "writeback" phase. > - a write happens to the

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > Is there any way to provide any debugging information that may help > solve the problem ? I think we have people working on this. I know I'm trying to even come up with an idea of what is going on. I don't think we know yet. > Would it help

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: * Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]: > > and it failed. > Since you are on ARM you might want to try with the page_mkclean_one > cleanup patch too. I've already tried it and it didn't work. I just tried it again

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr [EMAIL PROTECTED] wrote: * Peter Zijlstra [EMAIL PROTECTED] [2006-12-22 14:25]: and it failed. Since you are on ARM you might want to try with the page_mkclean_one cleanup patch too. I've already tried it and it didn't work. I just tried it again

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: Is there any way to provide any debugging information that may help solve the problem ? I think we have people working on this. I know I'm trying to even come up with an idea of what is going on. I don't think we know yet. Would it help to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page,

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrew Morton wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:26:01 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I also tested with ext3 ordered, nobh and I have file corruption... ordered+nobh isn't a possible combination. The filesystem probably ignored nobh. nobh mode only makes sense with data=writeback. - To unsubscribe from

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa [EMAIL PROTECTED] wrote: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Andrew Morton [EMAIL PROTECTED] [2006-12-24 00:57]: /etc/fstab: ext2 nobh /etc/fstab: ext3 data=writeback,nobh It seems that busybox mount ignores the nobh option but both ext2 and ext3 data=writeback work for me. This is with plain 2.6.19 which normally always fails. -- Martin

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa [EMAIL PROTECTED] wrote: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: Andrei Popa [EMAIL PROTECTED] wrote: /dev/sda7 on / type ext3 (rw,noatime,nobh) I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: Andrei Popa [EMAIL PROTECTED] wrote: /dev/sda7 on / type ext3 (rw,noatime,nobh) I don't have

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). Actually, here's an even more stupid diff, which actually to some degree seems to capture the real problem better. Peter, tell me I'm

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really looks

  1   2   3   4   5   6   >