Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton
On Sun, 7 Jan 2007 12:36:18 +1030 "Tom Lanyon" <[EMAIL PROTECTED]> wrote: > On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > What would also actually be interesting is whether somebody can reproduce > > this on Reiserfs, for example. I _think_ all the reports I've seen are on > > ext2 or

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 1/7/07, Tom Lanyon <[EMAIL PROTECTED]> wrote: I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far... -- Tom Lanyon - To

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just shar

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds
On Thu, 28 Dec 2006, Martin Schwidefsky wrote: > > For s390 there are two aspects to consider: > 1) the pte values are 100% software controlled. That's fine. In that situation, you shouldn't need any atomic ops at all, I think all our sw page-table operations are already done under the pte lo

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky
On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote: > What do you guys think? Does something like this work out for S/390 too? I > tried to make that "ptep_flush_dirty()" concept work for architectures > that hide the dirty bit somewhere else too, but.. For s390 there are two aspects to consi

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. I've had reports of corrupted data on earlier kernel releases with reiserfs3, which we

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn
On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote: > What would also actually be interesting is whether somebody can reproduce > this on Reiserfs, for example. I _think_ all the reports I've seen are on > ext2 or ext3, and if this is somehow writeback-related, it could be some > bug

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: - It never uses mprotect on the shared mappings, but it _does_ do: "mincore()" - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I do

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds
On Tue, 26 Dec 2006, Nick Piggin wrote: > Linus Torvalds wrote: > > > > Ok, so how about this diff. > > > > I'm actually feeling good about this one. It really looks like > > "do_no_page()" was simply buggy, and that this explains everything. > > Still trying to catch up here, so I'm not goin

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote: > On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > > > Hash check on download completion found bad chunks, consider using > > > "safe_sync". > > > > Dang. Did you

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > > Linus BTW, rm

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin
Linus Torvalds wrote: On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-24 11:35]: > And if this doesn't fix it, I don't know what will.. Sorry, but it still fails (on top of plain 2.6.19). -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Michael S. Tsirkin
> Quoting Linus Torvalds <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content > corruption on ext3) > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping &g

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like "do_no_page()" was simply buggy, and that this explains everything. I tested with just this patch and 2.6.19 and no change. Sorry Linus, no early C

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > only these: ACPI: EC: evaluating _Q80 A

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: > > Hash check on download completion found bad chunks, consider using > "safe_sync". Dang. Did you get any warning messages from the kernel? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > > this patch and 2.6.19. > > Yeah, if my guess about do_no_page() is right, _none_ of the previous > patches

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the "ex

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > - writable > - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really lo

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > How about this particularly stupid diff? (please test with something that > _would_ cause corruption normally). Actually, here's an even more stupid diff, which actually to some degree seems to capture the real problem better. Peter, tell me I'm

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > > > I don't have corruption. I tested twice. > > > > This is a surprising result. Can you pleas

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 14:14:38 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > - mount the fs with ext2 with the no-buffer-head option. That means > > > either: > > > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > > /etc/f

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Andrew Morton <[EMAIL PROTECTED]> [2006-12-24 00:57]: > /etc/fstab: ext2 nobh > /etc/fstab: ext3 data=writeback,nobh It seems that busybox mount ignores the nobh option but both ext2 and ext3 data=writeback work for me. This is with plain 2.6.19 which normally always fails. -- Martin Michl

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > /etc/fstab: ext2 nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext2 (rw,noatime

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:26:01 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > I also tested with ext3 ordered, nobh and I have file corruption... ordered+nobh isn't a possible combination. The filesystem probably ignored nobh. nobh mode only makes sense with data=writeback. - To unsubscribe from

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: > On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > I now _suspect_ that we're talking about something like > > > > > > - we started a

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrew Morton wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked clean and is now in the "writeback" phase. > > - a write happens to the page, and the page gets mar

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > I now _suspect_ that we're talking about something like > > - we started a writeout. The IO is still pending, and the page was >marked clean and is now in the "writeback" phase. > - a write happens to the

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > Is there any way to provide any debugging information that may help > solve the problem ? I think we have people working on this. I know I'm trying to even come up with an idea of what is going on. I don't think we know yet. > Would it help t

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: * Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]: > > and it failed. > Since you are on ARM you might want to try with the page_mkclean_one > cleanup patch too. I've already tried it and it didn't work. I just tried it again

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote: > * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > > With all three patches I have corruption > > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... but I

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]: > > and it failed. > Since you are on ARM you might want to try with the page_mkclean_one > cleanup patch too. I've already tried it and it didn't work. I just tried it again together with Linus' patch and the two from Andrew and it st

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Linus Torvalds
On Fri, 22 Dec 2006, Peter Zijlstra wrote: > > fix page_mkclean_one() > > - add flush_cache_page() for all those virtual indexed cache >architectures. I think the flush_cache_page() should be after we've actually flushed it from the TLB and re-inserted it (this is one reason why I did th

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-22 08:30]: > Based on the kernel gurus current knowledge of the problem, would > you expect the corruption to occur at the same point in a file, or > is it possible that the corruption could occur at different points > on successive Debian installer

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: ... and now that we've completed this step, the apt cache has suddenly been reduced (see Gordon's mail for an explanation) and it segfaults: sh-3.1# ls -l /var/cache/apt/ total 12524 drwxr-xr-x 3 root root 12288 Dec 22 04:41 archives -r

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Patrick Mau
On Fri, Dec 22, 2006 at 01:32:49PM +0100, Martin Michlmayr wrote: > * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > > With all three patches I have corruption > > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again..

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: sh-3.1# ls -l /var/cache/apt/ total 5252 drwxr-xr-x 3 root root12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin -rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin This listing is a littl

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson
On 12/21/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Andrew located at least one bug: we run cancel_dirty_page() too late in "truncate_complete_page()", which means that do_invalidatepage() ends up not clearing the page cache. His patch is appended. Thanks. I'll try this out later today.

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra
A cleanup of try_to_unmap. I have not identified any races that this would solve, but for consistencies sake. Also includes a small s390 optimization by moving page_test_and_clear_dirty() out of the vma iteration. From: Peter Zijlstra <[EMAIL PROTECTED]> We clear the page in the following sequ

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra
On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote: > * Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]: > > I've completed one installation with Linus' patch plus the two from > > Andrew successfully, but I'm currently trying again... > > and it failed. Since you are on ARM y

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]: > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... ... and it failed. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "u

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really need a better testcase since an installation takes about an h

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa
With all three patches I have corruption diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Andrew Morton <[EMAIL PROTECTED]> [2006-12-22 02:17]: > > This hunk (on top of git from about 2 days ago and your latest patch) > > results in the installer hanging right at the start. > > You'll need this also: It starts again, thanks. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscri

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:10]: > > immediately when I started wget, the hanging apt-get process > > continued. > ... and now that we've completed this step, the apt cache has suddenly > been reduced (see Gordon's mail for an explanation) and it segfaults: One of my ques

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrew Morton
On Fri, 22 Dec 2006 11:00:04 +0100 Martin Michlmayr <[EMAIL PROTECTED]> wrote: > > - if (TestClearPageDirty(page) && account_size) > > + if (TestClearPageDirty(page) && account_size) { > > + dec_zone_page_state(page, NR_FILE_DIRTY); > > task_io_account_cancelled_write(acc

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:06]: > Okay, it's really weird. So apt-get just hangs doing nothing and I > cannot even kill it. I just tried to download strace via wget and > immediately when I started wget, the hanging apt-get process > continued. ... and now that we've c

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:00]: > This time, however, I let the installer continue and it seems that > with your patch apt now works where it failed in the past, but it > hangs later on. It's pretty weird because I cannot even kill the > process: Okay, it's really weird

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-21 21:20]: > generating these files, pkgcache.bin grows to 12582912 bytes, and when > apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is > 64254483 bytes. This time, when apt-get exited, it had only created > pkgcache.bin which w

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-21 20:54]: > But it sounds like I probably misunderstood something, because I thought > that Martin had acknowledged that this patch actually worked for him. That's what I thought too but now I can confirm what Gordon sees. But it's pretty weird. Our

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds
On Thu, 21 Dec 2006, Gordon Farquharson wrote: > > I tested 2.6.19 with a version of Linus's patch that applies cleanly > to 2.6.19 (patch appended to the end of this email) on ARM and apt-get > failed. It did not segfault this time, but instead got stuck for about > 20 to 30 minutes and was acc

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Gordon Farquharson
On 12/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote: > Can the call to task_io_account_cancelled_write() simply be removed > from cancel_dirty_page() for testing the patch with 2.6.19 (since > 2.6.19 doesn't seem to have the task I/O accounting) ? Yes. I tested 2.6.19 with a version of Linus

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds
On Thu, 21 Dec 2006, Peter Zijlstra wrote: > > Also, I'm dubious about the while thing and stuck a WARN_ON(ret) thing > at the beginning of the loop. flush_tlb_page() does IPI the other cpus > to flush their tlb too, so there should not be a SMP race, Arjan? Now, the reason I think the loop may

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds
On Wed, 20 Dec 2006, Trond Myklebust wrote: > > I can't see that it is the business of invalidate_inode_pages2() to > resolve races between ->direct_IO() and pages that are redirtied by > mmap(). All it needs to ensure is that pages that clean are discarded, > since those are neither consistent

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds
On Thu, 21 Dec 2006, Andrei Popa wrote: > On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote: > > > > Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be > > talking about different bugs, so _both_ of your experiences definitely > > matter here). > > with http://lkml.org

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Andrei Popa
On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote: > > Btw, I'd really love to hear whether the patch I sent out actually _helps_ > at all, or whether we're just discussing something that in the end is just > a cleanup.. > > Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Russell King
On Thu, Dec 21, 2006 at 12:30:22PM +, Russell King wrote: > On Wed, Dec 20, 2006 at 11:53:25PM -0800, Linus Torvalds wrote: > > That's obviously a bug worth fixing on its own. Do you know when it > > started? > > My last merge, just before 2.6.19-rc1. Obviously 2.6.20-rc1. -- Russell King

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Russell King
On Wed, Dec 20, 2006 at 11:53:25PM -0800, Linus Torvalds wrote: > That's obviously a bug worth fixing on its own. Do you know when it > started? My last merge, just before 2.6.19-rc1. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscrib

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-20 11:50]: > Martin, Andrei, does this make any difference for your corruption > cases? Works for me. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMA

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Russell King
On Thu, Dec 21, 2006 at 09:18:45AM +0100, Martin Michlmayr wrote: > * Russell King <[EMAIL PROTECTED]> [2006-12-20 22:11]: > > > This patch doesn't fix my problem (apt segfaults on ARM because its > > > database is corrupted). > > > > Are you using IDE in PIO mode? If so, the bug probably lies th

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Schwidefsky
On Thu, 2006-12-21 at 10:20 +0100, Peter Zijlstra wrote: > > Now you are flushing the tlb twice. ptep_clear_flush clears the pte and > > flushes the tlb, ptep_establish sets the new pte and flushes the tlb. > > Not good. Use set_pte_at instead of the ptep_establish. > > Yeah, sorry, I already noti

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Andrew Morton
On Thu, 21 Dec 2006 02:17:05 -0700 "Gordon Farquharson" <[EMAIL PROTECTED]> wrote: > Can the call to task_io_account_cancelled_write() simply be removed > from cancel_dirty_page() for testing the patch with 2.6.19 (since > 2.6.19 doesn't seem to have the task I/O accounting) ? Yes. - To unsubscri

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Peter Zijlstra
On Thu, 2006-12-21 at 10:16 +0100, Martin Schwidefsky wrote: > On Thu, 2006-12-21 at 00:03 +0100, Peter Zijlstra wrote: > > current version > > Nitpicking .. > > > @@ -444,17 +444,18 @@ static int page_mkclean_one(struct page > > if (!pte) > > goto out; > > > > - if (!pte_dirty

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Gordon Farquharson
On 12/21/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: That said, I think the patch I sent out should actually work on top of plain 2.6.19 too. I don't think things have changed in this area that much. IOW, you don't _need_ latest -git to test it, you just need a broken kernel ;) I created a v

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Schwidefsky
On Thu, 2006-12-21 at 00:03 +0100, Peter Zijlstra wrote: > current version Nitpicking .. > @@ -444,17 +444,18 @@ static int page_mkclean_one(struct page > if (!pte) > goto out; > > - if (!pte_dirty(*pte) && !pte_write(*pte)) > - goto unlock; > + while (pte

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds
On Thu, 21 Dec 2006, Martin Michlmayr wrote: > > This is a known issue. The following patch has been proposed > http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=4030/1 > although I just notice that it has been marked as "discarded". > Apparently Russell King commited a better patc

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-20 23:53]: > > Unfortunately, I cannot get the latest git version of the kernel to > > boot on the ARM machine on which Martin and I are experiencing the apt > > segfault. > > Ouch. > > That's obviously a bug worth fixing on its own. Do you know when

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Michlmayr
* Russell King <[EMAIL PROTECTED]> [2006-12-20 22:11]: > > This patch doesn't fix my problem (apt segfaults on ARM because its > > database is corrupted). > > Are you using IDE in PIO mode? If so, the bug probably lies there. I'm using usb-storage. It's used to access an external IDE drive in a

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Peter Zijlstra
On Wed, 2006-12-20 at 21:36 -0500, Trond Myklebust wrote: > On Wed, 2006-12-20 at 23:15 +0100, Peter Zijlstra wrote: > > I think this is also needed: > > NAK > > invalidate_inode_pages2() should _not_ be pretending that dirty pages > are clean. This patch is incorrect both for the NFS usage and f

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Thu, 21 Dec 2006, Gordon Farquharson wrote: > > Unfortunately, I cannot get the latest git version of the kernel to > boot on the ARM machine on which Martin and I are experiencing the apt > segfault. Ouch. > After the kernel is finished uncompressing it prints "done, > booting the kernel."

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Gordon Farquharson
On 12/20/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Ok, I'll just put my money where my mouth is, and suggest a patch like THIS instead. Martin, Andrei, does this make any difference for your corruption cases? Unfortunately, I cannot get the latest git version of the kernel to boot on th

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Trond Myklebust
On Wed, 2006-12-20 at 15:55 -0800, Linus Torvalds wrote: > > With your change I think what'll happen is that we'll correctly handle the > > case where the page and its buffers are dirty (it gets left in place), but > > we'll needlessy fail in the case where the page is dirty but the buffers > > are

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Trond Myklebust
On Wed, 2006-12-20 at 23:15 +0100, Peter Zijlstra wrote: > I think this is also needed: NAK invalidate_inode_pages2() should _not_ be pretending that dirty pages are clean. This patch is incorrect both for the NFS usage and for the directIO usage. In the latter case, if someone has the page mmap

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread David Chinner
On Wed, Dec 20, 2006 at 03:55:25PM -0800, Linus Torvalds wrote: > On Thu, 21 Dec 2006, David Chinner wrote: > > > > XFS appears to call clear_page_dirty to get the mapping tree dirty > > tag set correctly at the same time the page dirty flag is cleared. I > > note that this can be done by set_page

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Andrew Morton
On Wed, 20 Dec 2006 16:43:31 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Wed, 20 Dec 2006, Linus Torvalds wrote: > > > > > > There's also redirty_page_for_writepage(). > > > > _dirtying_ a page makes sense in any situation. You can always dirty them. > > I'm just saying tha

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Linus Torvalds wrote: > > > > There's also redirty_page_for_writepage(). > > _dirtying_ a page makes sense in any situation. You can always dirty them. > I'm just saying that you can't just mark them *clean*. > > If your point was that the filesystem had better be able to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
Btw, I'd really love to hear whether the patch I sent out actually _helps_ at all, or whether we're just discussing something that in the end is just a cleanup.. Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be talking about different bugs, so _both_ of your experiences def

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Andrew Morton wrote: > > > > So with my change, afaik, we will just return EIO to the invalidate, and > > do the write. > > The write's already been done by this stage. Ok, but the end result is the same: you MUST NOT just "cancel" a write. It needs to be done, or the ba

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Andrew Morton
On Wed, 20 Dec 2006 15:55:14 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > @@ -386,12 +399,8 @@ int invalidate_inode_pages2_range(struct > > > address_space *mapping, > > > > invalidate_complete_page2() is pretty gruesome. We're handling the case > > where someone went and redirti

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Thu, 21 Dec 2006, David Chinner wrote: > > XFS appears to call clear_page_dirty to get the mapping tree dirty > tag set correctly at the same time the page dirty flag is cleared. I > note that this can be done by set_page_writeback() if we clear the > dirty flag on the page first when we are

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Andrew Morton wrote: > > > +void cancel_dirty_page(struct page *page, unsigned int account_size) > > +{ > > + /* If we're cancelling the page, it had better not be mapped any more */ > > + if (page_mapped(page)) { > > + static unsigned int warncount; > > + > >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Andrew Morton
On Wed, 20 Dec 2006 11:50:50 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > Ok, I'll just put my money where my mouth is, and suggest a patch like > THIS instead. > > ... > > diff --git a/fs/buffer.c b/fs/buffer.c > index d1f1b54..263f88e 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread David Chinner
On Wed, Dec 20, 2006 at 11:50:50AM -0800, Linus Torvalds wrote: > > > On Wed, 20 Dec 2006, Linus Torvalds wrote: > > > > So that's why I've been harping on the fact that I think we simply do > > really wrong things with PG_dirty at times [ ... ] > > Ok, I'll just put my money where my mouth is

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Peter Zijlstra
On Wed, 2006-12-20 at 14:49 -0800, Linus Torvalds wrote: > > On Wed, 20 Dec 2006, Peter Zijlstra wrote: > > > > I think this is also needed: > > Yeah, that looks about right. Although I think it should go above the > "try_to_release_page()", because right now we do that "ttrp()" with the > dirt

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Dave Kleikamp
On Wed, 2006-12-20 at 14:25 -0800, Linus Torvalds wrote: > > On Wed, 20 Dec 2006, Dave Kleikamp wrote: > > > > This patch removes some questionable code that attempted to make a > > no-longer-used page easier to reclaim. > > If so, "cancel_dirty_page()" may actually be the right thing to use, but

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Peter Zijlstra wrote: > > I think this is also needed: Yeah, that looks about right. Although I think it should go above the "try_to_release_page()", because right now we do that "ttrp()" with the dirty bit set, and we should let the low-level filesystem just know that it

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Dave Kleikamp wrote: > > This patch removes some questionable code that attempted to make a > no-longer-used page easier to reclaim. If so, "cancel_dirty_page()" may actually be the right thing to use, but only if you can guarantee that the page isn't mapped anywhere (and

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Peter Zijlstra
On Wed, 2006-12-20 at 23:15 +0100, Peter Zijlstra wrote: > I think this is also needed: See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=116603599904278&w=2 > --- > mm/truncate.c |7 +-- > 1 file changed, 1 insertion(+), 6 deletions(-) > > Index: linux-2.6/mm/truncate.c >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Peter Zijlstra
I think this is also needed: --- mm/truncate.c |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) Index: linux-2.6/mm/truncate.c === --- linux-2.6.orig/mm/truncate.c +++ linux-2.6/mm/truncate.c @@ -320,19 +320,14 @@ inva

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Russell King
On Wed, Dec 20, 2006 at 06:03:23PM +0100, Martin Michlmayr wrote: > * Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-20 14:56]: > > page_mkclean_one() fix > > This patch doesn't fix my problem (apt segfaults on ARM because its > database is corrupted). Are you using IDE in PIO mode? If so, the bug

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Dave Kleikamp
On Wed, 2006-12-20 at 11:50 -0800, Linus Torvalds wrote: > NOTE NOTE NOTE! I _only_ did enough to make things compile for my > particular configuration. That means that right now the following > filesystems are broken with this patch (because they use the totally > broken old crap): > > CIF

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Peter Zijlstra
On Wed, 2006-12-20 at 11:50 -0800, Linus Torvalds wrote: > Nick, Hugh, Peter, Andrew? Comments? Hooray! I'm all for this cleanup. Let us see where this road leads.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More maj

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Linus Torvalds wrote: > > So that's why I've been harping on the fact that I think we simply do > really wrong things with PG_dirty at times [ ... ] Ok, I'll just put my money where my mouth is, and suggest a patch like THIS instead. This one clears up all the issues I f

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Martin Michlmayr wrote: > > > Anyway, the page_mkclean_one() fixes (along with _most_ things we've > > looked at) shouldn't matter on UP, at least certainly not without > > PREEMPT. > > Hmm. So what about UP without PREEMPT then... So that's why I've been harping on the f

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-20 09:35]: > Can you remind us: > - your ARM is UP, right? Do you have PREEMPT on? It's UP and PREEMPT is not set. I used 2.6.19 plus the patch that has been posted. > - This is probably a stupid question, but you did make sure that the >databa

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-20 Thread Linus Torvalds
On Wed, 20 Dec 2006, Martin Michlmayr wrote: > * Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-20 14:56]: > > page_mkclean_one() fix > > This patch doesn't fix my problem (apt segfaults on ARM because its > database is corrupted). Can you remind us: - your ARM is UP, right? Do you have PREEMPT

  1   2   >