subject:"2.6.19 file content corruption on ext3"

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton

On Sun, 7 Jan 2007 12:36:18 +1030
"Tom Lanyon" <[EMAIL PROTECTED]> wrote:

> On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > What would also actually be interesting is whether somebody can reproduce
> > this on Reiserfs, for example. I _think_ all the reports I've seen are on
> > ext2 or ext3, and if this is somehow writeback-related, it could be some
> > bug that is just shared between the two by virtue of them still having a
> > lot of stuff in common.
> >
> > Linus
> 
> I've been following this thread for a while now as I started
> experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
> am using reiserfs.

reiserfs defaults to data=ordered, so it's quite possibly the same bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon


On 1/7/07, Tom Lanyon <[EMAIL PROTECTED]> wrote:

I've been following this thread for a while now as I started
experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
am using reiserfs.


However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far...

--
Tom Lanyon
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon


On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:

What would also actually be interesting is whether somebody can reproduce
this on Reiserfs, for example. I _think_ all the reports I've seen are on
ext2 or ext3, and if this is somehow writeback-related, it could be some
bug that is just shared between the two by virtue of them still having a
lot of stuff in common.

Linus


I've been following this thread for a while now as I started
experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
am using reiserfs.

--
Tom Lanyon
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Dave Jones

On Fri, Dec 29, 2006 at 07:52:15PM +0100, maximilian attems wrote:
 
 > > The only -mm stuff I recall being in the Fedora 2.6.18 is
 > > the inode-diet stuff which ended up in 2.6.19, though the xmas
 > > break has left my head somewhat empty so I may be forgetting something.
 > > What patch in particular are you talking about?
 > 
 > it's no longer visible in the FC6 cvs, due to rebase
 >  but it's name was linux-2.6-mm-tracking-dirty-pages.patch
 > it is an earlier almagame of the merged patch serie:
 >- mm: tracking shared dirty pages
 >- mm: balance dirty pages
 >- mm: optimize the new mprotect() code a bit
 >- mm: small cleanup of install_page()
 >- mm: fixup do_wp_page()
 >- mm: msync() cleanup (closes: #394392)

Ohh, that. Yes. I had forgotten all about that.
I've been hitting the nog a little too hard :)

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread maximilian attems

On Fri, Dec 29, 2006 at 10:02:53AM -0500, Dave Jones wrote:
> On Fri, Dec 29, 2006 at 10:23:14AM +0100, maximilian attems wrote:
>  > > On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote:

>  > >  > > That was a Fedora kernel. Has anyone seen the corruption in vanilla 
> 2.6.18
>  > >  > > (or older)?
>  > >  > 
>  > >  > Well, that was a really _old_ fedora kernel. I guarantee you it 
> didn't 
>  > >  > have the page throttling patches in it, those were written this 
> summer. So 
>  > >  > it would either have to be Fedora carrying around another patch that 
> just 
>  > >  > happens to result in the same corruption for _years_, or it's the 
> same 
>  > >  > bug.
>  > > 
>  > > The only notable VM patch in Fedora kernels of that vintage that I recall
>  > > was Ingo's 4g/4g thing.
>  > 
>  > no the fedora 2.6.18 kernel is affected.
> 
> I wasn't denying that, but Linus was talking about a 2.6.5 Fedora kernel.
> 
>  > it carries the same -mm patches that Debian backported
>  > for LSB 3.1 compliance.
> 
> The only -mm stuff I recall being in the Fedora 2.6.18 is
> the inode-diet stuff which ended up in 2.6.19, though the xmas
> break has left my head somewhat empty so I may be forgetting something.
> What patch in particular are you talking about?

it's no longer visible in the FC6 cvs, due to rebase
 but it's name was linux-2.6-mm-tracking-dirty-pages.patch
it is an earlier almagame of the merged patch serie:
   - mm: tracking shared dirty pages
   - mm: balance dirty pages
   - mm: optimize the new mprotect() code a bit
   - mm: small cleanup of install_page()
   - mm: fixup do_wp_page()
   - mm: msync() cleanup (closes: #394392)

--
maks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Guillaume Chazarain


Linus Torvalds a écrit :

going back to Linux-2.6.5 at least, according to one tester).
  


I apologize for the confusion, but it just occurred to me that I was 
actually
experiencing a totally different problem: I set a root filesystem of 
3Mib for

qemu, so the test program just didn't have enough space for its file.

--
Guillaume

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread Dave Jones

On Fri, Dec 29, 2006 at 10:23:14AM +0100, maximilian attems wrote:
 > > On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote:
 > >  > 
 > >  > 
 > >  > On Thu, 28 Dec 2006, Petri Kaukasoina wrote:
 > >  > > > me up), and that seems to show the corruption going way way back 
 > > (ie going 
 > >  > > > back to Linux-2.6.5 at least, according to one tester).
 > >  > > 
 > >  > > That was a Fedora kernel. Has anyone seen the corruption in vanilla 
 > > 2.6.18
 > >  > > (or older)?
 > >  > 
 > >  > Well, that was a really _old_ fedora kernel. I guarantee you it didn't 
 > >  > have the page throttling patches in it, those were written this summer. 
 > > So 
 > >  > it would either have to be Fedora carrying around another patch that 
 > > just 
 > >  > happens to result in the same corruption for _years_, or it's the same 
 > >  > bug.
 > > 
 > > The only notable VM patch in Fedora kernels of that vintage that I recall
 > > was Ingo's 4g/4g thing.
 > 
 > no the fedora 2.6.18 kernel is affected.

I wasn't denying that, but Linus was talking about a 2.6.5 Fedora kernel.

 > it carries the same -mm patches that Debian backported
 > for LSB 3.1 compliance.

The only -mm stuff I recall being in the Fedora 2.6.18 is
the inode-diet stuff which ended up in 2.6.19, though the xmas
break has left my head somewhat empty so I may be forgetting something.
What patch in particular are you talking about?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-29 Thread maximilian attems

> On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote:
>  > 
>  > 
>  > On Thu, 28 Dec 2006, Petri Kaukasoina wrote:
>  > > > me up), and that seems to show the corruption going way way back (ie 
> going 
>  > > > back to Linux-2.6.5 at least, according to one tester).
>  > > 
>  > > That was a Fedora kernel. Has anyone seen the corruption in vanilla 
> 2.6.18
>  > > (or older)?
>  > 
>  > Well, that was a really _old_ fedora kernel. I guarantee you it didn't 
>  > have the page throttling patches in it, those were written this summer. So 
>  > it would either have to be Fedora carrying around another patch that just 
>  > happens to result in the same corruption for _years_, or it's the same 
>  > bug.
> 
> The only notable VM patch in Fedora kernels of that vintage that I recall
> was Ingo's 4g/4g thing.
> 
>   Dave

no the fedora 2.6.18 kernel is affected.
it carries the same -mm patches that Debian backported
for LSB 3.1 compliance.

-- 
maks

ps sorry for stripping cc, only downloaded that message raw.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Andrew Morton

On Thu, 28 Dec 2006 17:38:38 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> in 
> the hope that somebody else is working on this corruption issue and is 
> interested..

What corruption issue? ;)

I'm finding that the corruption happens trivially with your test app, but
apparently doesn't happen at all with ext2 or ext3, data=writeback.  Maybe
it will happen with increased rarity, but the difference is quite stark.

Removing the

err = walk_page_buffers(handle, page_bufs, 0, PAGE_CACHE_SIZE,
NULL, journal_dirty_data_fn);

from ext3_ordered_writepage() fixes things up.

The things which journal_submit_data_buffers() does after dropping all the
locks are ...  disturbing - I don't think we have sufficient tests in there
to ensure that the buffer is still where we think it is after we retake
locks (they're slippery little buggers).  But that wouldn't explain it
anyway.

It's inefficient that journal_dirty_data() will put these locked, clean
buffers onto BJ_SyncData instead of BJ_Locked, but
journal_submit_data_buffers() seems to dtrt with them.

So no theory yet.  Maybe ext3 is just altering timing.  But the difference
is really large..

Disabling all the WB_SYNC_NONE stuff and making everything go synchronous
everywhere has no effect.  Disabling bdi_write_congested() has no effect.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds


Btw, 
 much cleaned-up page tracing patch here, in case anybody cares (and 
"test.c" attached, although I don't think it changed since last time). 

The test.c output is a bit hard to read at times, since it will give 
offsets in bytes as hex (ie "00a77664" means page frame 0a77, and byte 
664h within that page), while the kernel output is obvioiusly the page 
indexes (but the page fault _addresses_ can contain information about the 
exact byte in a page, so you can match them up when some kernel event is 
related to a page fault).

So both forms are necessary/logical, but it means that to match things up, 
you often need to ignore the last three hex digits of the address that 
"test.c" outputs.

This one also adds traces for the tags and the writeback activity, but 
since I'm going out for birthday dinner, I won't have time to try to 
actually analyse the trace I have.. Which is why I'm sending it out, in 
the hope that somebody else is working on this corruption issue and is 
interested..

Linus


diff --git a/fs/buffer.c b/fs/buffer.c
index 263f88e..f5e132a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -722,6 +722,7 @@ int __set_page_dirty_buffers(struct page *page)
set_buffer_dirty(bh);
bh = bh->b_this_page;
} while (bh != head);
+   PAGE_TRACE(page, "dirtied buffers");
}
spin_unlock(&mapping->private_lock);
 
@@ -734,6 +735,7 @@ int __set_page_dirty_buffers(struct page *page)
__inc_zone_page_state(page, NR_FILE_DIRTY);
task_io_account_write(PAGE_CACHE_SIZE);
}
+   PAGE_TRACE(page, "setting TAG_DIRTY");
radix_tree_tag_set(&mapping->page_tree,
page_index(page), PAGECACHE_TAG_DIRTY);
}
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 350878a..0cf3dce 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -91,6 +91,14 @@
 #define PG_nosave_free 18  /* Used for system suspend/resume */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
+#define SetPageInteresting(page) set_bit(PG_arch_1, &(page)->flags)
+#define PageInteresting(page)  test_bit(PG_arch_1, &(page)->flags)
+
+#define PAGE_TRACE(page, msg, arg...) do { 
\
+   if (PageInteresting(page))  
\
+   printk(KERN_DEBUG "PG %08lx: %s:%d " msg "\n",  
\
+   (page)->index, __FILE__, __LINE__ ,##arg ); 
\
+} while (0)
 
 #if (BITS_PER_LONG > 32)
 /*
@@ -183,32 +191,38 @@ static inline void SetPageUptodate(struct page *page)
 #define PageWriteback(page)test_bit(PG_writeback, &(page)->flags)
 #define SetPageWriteback(page) \
do {\
-   if (!test_and_set_bit(PG_writeback, \
-   &(page)->flags))\
+   if (!test_and_set_bit(PG_writeback, &(page)->flags)) {  \
+   PAGE_TRACE(page, "set writeback");  \
inc_zone_page_state(page, NR_WRITEBACK);\
+   }   \
} while (0)
 #define TestSetPageWriteback(page) \
({  \
int ret;\
ret = test_and_set_bit(PG_writeback,\
&(page)->flags);\
-   if (!ret)   \
+   if (!ret) { \
+   PAGE_TRACE(page, "set writeback");  \
inc_zone_page_state(page, NR_WRITEBACK);\
+   }   \
ret;\
})
 #define ClearPageWriteback(page)   \
do {\
-   if (test_and_clear_bit(PG_writeback,\
-   &(page)->flags))\
+   if (test_and_clear_bit(PG_writeback, &(page)->flags)) { \
+   PAGE_TRACE(page, "end writeback");  \
dec_zone_page_state(page, NR_WRITEBACK);\
+   }   \
} while (0)
 #define TestClearPageWriteback(page)   \
({

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Anton Altaparmakov

On Thu, 28 Dec 2006, Linus Torvalds wrote:
> Ok,
>  with the ugly trace capture patch, I've actually captured this corruption 
> in action, I think.
> 
> I did a full trace of all pages involved in one run, and picked one 
> corruption at random:
> 
>   Chunk 14465 corrupted (0-75)  (01423fb4-01423fff)
>   Expected 129, got 0
>   Written as (5126)9509(15017)
> 
> That's the first 76 bytes of a chunk missing, and it's the last 76 bytes 
> on a page. It's page index 01423 in the mapped file, and bytes fb4-fff 
> within that file.
> 
> There were four chunks written to that page:
> 
>   Writing chunk 14463/15800 (15%) (0142344c) (1)
>   Writing chunk 14462/15800 (30%) (01422e98) (2) (overflows into 1423)
>   Writing chunk 14464/15800 (32%) (01423a00) (3)
>   Writing chunk 14465/15800 (60%) (01423fb4) (4)  <--- LOST!
> 
> and the other three chunks checked out all right.
> 
> And here's the annotated trace as it concerns that page:
> 
>  - here we write the first chunk to the page:
>   ** (1)  do_no_page: mapping index 1423 at b7d1f44c (write)
>   **  Setting page 1423 dirty
> 
>  - something flushes it out to disk:
>   **  cpd_for_io: index 1423
>   **  cleaning index 1423 at b7d1f000
> 
>  - here we write the second chunk (which was split over the previous page 
>and the interesting one):
>   ** (2)  Setting page 1422 dirty
>   ** (2)  Setting page 1423 dirty
> 
>  - and here we do a cleaning event
>   **  cpd_for_io: index 1423
>   **  cleaning index 1423 at b7d1f000
> 
>  - here we write the third chunk:
>   ** (3)  Setting page 1423 dirty
> 
>  - here we write the fourth chunk:
>   ** (4) NO DIRTY EVENT
> 
>  - and a third flush to disk: 
>   **  cpd_for_io: index 1423
>   **  cleaning index 1423 at b7d1f000
> 
>  - here we unmap and flush:
>   **  Unmapped index 1423 at b7d1f000
>   **  Removing index 1423 from page cache
> 
>  - here we remap to check:
>   **  do_no_page: mapping index 1423 at b7d1f000 (read)
>   **  Unmapped index 1423 at b7d1f000
> 
>  - and finally, here I remove the file after the run:
>   **  Removing index 1423 from page cache
> 
> Now, the important thing to see here is:
> 
>  - the missing write did not have a "Setting page 1423 dirty" event 
>associated with it.
> 
>  - but I can _see_ where the actual dirty event would be happening in the 
>logs, because I can see the dirty events of the other chunk writes 
>around it, so I know exactly where that fourth write happens. And 
>indeed, it _shouldn't_ get a dirty event, because the page is still 
>dirty from the write of chunk #3 to that page, which _did_ get a dirty 
>event.
> 
>I can see that, because the testing app writes the log of the pages it 
>writes, and this is the log around the fourth and final write:
> 
>   ...
> Writing chunk 5338/15800 (60%) (0076eb48)   PFN: 76e/76f
> Writing chunk 960/15800 (60%) (00156300)PFN: 156
> Writing chunk 14465/15800 (60%) (01423fb4)  <
> Writing chunk 8594/15800 (60%) (00bf74a8)   PFN: bf7
> Writing chunk 556/15800 (60%) (000c62f0)PFN: c6
>   Writing chunk 15190/15800 (60%) (01526678)  PFN: 1526
>   ...
> 
>and I can match this up with the full log from the kernel, which looks 
>like this:
> 
> Setting page 076e dirty
> Setting page 076f dirty
> Setting page 0156 dirty
> Setting page 00c6 dirty
>   Setting page 1526 dirty
> 
>so I know exactly where the missing writes (to our page at pfn 1423, 
>and the fpn-bf7 page) happened.
> 
>  - and the thing is, I can see a "cpd_for_io()" happening AFTER that 
>fourth write. Quite a long while after, in fact. So all of this looks 
>very fine indeed. We are not losing any dirty bits.
> 
>  - EVEN MORE INTERESTING: write 3 makes it onto disk, and it really uses 
>the SAME dirty bit as write 4 did (which didn't make it out to disk!). 
>The event that clears the dirty bit that write 3 did happens AFTER 
>write 4 has happened!
> 
> So if we're not losing any dirty bits, what's going on?
> 
> I think we have some nasty interaction with the buffer heads. In 

But are chunks 3 and 4 in separate buffer heads?  Sorry could not see it 
immediately from the output you showed...

It is just that there may be a different cause rather than buffer dirty 
state...

A shot in the dark I know but it could perhaps be that a "COW for 
MAP_PRIVATE" like event happens when the page is dirty already thus the 
second write never actually makes it to the shared page thus it never gets 
written out.

I am almost certainly totally barking up the wrong tree but I thought it 
may be worth mentioning just in case there was a slip in the COW logic or 
page wr

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2006, Anton Altaparmakov wrote:
> 
> But are chunks 3 and 4 in separate buffer heads?  Sorry could not see it 
> immediately from the output you showed...

No, this is a 4kB filesystem. A single bh per page.

> It is just that there may be a different cause rather than buffer dirty 
> state...

Sure.

> A shot in the dark I know but it could perhaps be that a "COW for 
> MAP_PRIVATE" like event happens when the page is dirty already thus the 
> second write never actually makes it to the shared page thus it never gets 
> written out.

There are no private mappings anywhere, and no forks. Just a single mmap 
(well, we unmap and remap in order to force the page cache to be 
invalidated properly with the posix_fadvise() thing, but that's literally 
the only user).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds

On Thu, 28 Dec 2006, David Miller wrote:
> 
> What happens when we writeback, to the PTEs?

Not a damn thing.

We clear the PTE's _before_ we even start the write. The writeback does 
nothing to them. If the user dirties the page while writeback is in 
progress, we'll take the page fault and re-dirty it _again_.

> page_mkclean_file() iterates the VMAs and when it finds a shared
> one it goes:
> 
>   entry = ptep_clear_flush(vma, address, pte);
>   entry = pte_wrprotect(entry);
>   entry = pte_mkclean(entry);
> 
> and that's fine, but that PTE is still marked writable, and
> I think that's key.

No it's not. It's right there. "pte_wrprotect(entry)". You even copied it 
yourself.

> What does the fault path do in this situation?
> 
>   if (write_access) {
>   if (!pte_write(entry))
>   return do_wp_page(mm, vma, address,
>   pte, pmd, ptl, entry);

So we call "do_wp_page()", and that does everythign right.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread David Miller

From: Linus Torvalds <[EMAIL PROTECTED]>
Date: Thu, 28 Dec 2006 14:37:37 -0800 (PST)

> So if we're not losing any dirty bits, what's going on?

What happens when we writeback, to the PTEs?

page_mkclean_file() iterates the VMAs and when it finds a shared
one it goes:

entry = ptep_clear_flush(vma, address, pte);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);

and that's fine, but that PTE is still marked writable, and
I think that's key.

What does the fault path do in this situation?

if (write_access) {
if (!pte_write(entry))
return do_wp_page(mm, vma, address,
pte, pmd, ptl, entry);
entry = pte_mkdirty(entry);
}

It does nothing to update the page dirty state, because it's
writable, it just sets the PTE dirty bit and that's it.  Should
it be setting the page dirty here for SHARED cases?

So until vmscan actually unmaps the PTE completely, we have this
window in which the application can write to the PTE and the
page dirty state doesn't get updated.

Perhaps something later cleans up after this, f.e. by rechecking the
PTE dirty bit at the end of I/O or when vmscan unmaps the page.
I guess that should handle things, but the above logic definitely
stood out to me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds


Ok,
 with the ugly trace capture patch, I've actually captured this corruption 
in action, I think.

I did a full trace of all pages involved in one run, and picked one 
corruption at random:

Chunk 14465 corrupted (0-75)  (01423fb4-01423fff)
Expected 129, got 0
Written as (5126)9509(15017)

That's the first 76 bytes of a chunk missing, and it's the last 76 bytes 
on a page. It's page index 01423 in the mapped file, and bytes fb4-fff 
within that file.

There were four chunks written to that page:

Writing chunk 14463/15800 (15%) (0142344c) (1)
Writing chunk 14462/15800 (30%) (01422e98) (2) (overflows into 1423)
Writing chunk 14464/15800 (32%) (01423a00) (3)
Writing chunk 14465/15800 (60%) (01423fb4) (4)  <--- LOST!

and the other three chunks checked out all right.

And here's the annotated trace as it concerns that page:

 - here we write the first chunk to the page:
** (1)  do_no_page: mapping index 1423 at b7d1f44c (write)
**  Setting page 1423 dirty

 - something flushes it out to disk:
**  cpd_for_io: index 1423
**  cleaning index 1423 at b7d1f000

 - here we write the second chunk (which was split over the previous page 
   and the interesting one):
** (2)  Setting page 1422 dirty
** (2)  Setting page 1423 dirty

 - and here we do a cleaning event
**  cpd_for_io: index 1423
**  cleaning index 1423 at b7d1f000

 - here we write the third chunk:
** (3)  Setting page 1423 dirty

 - here we write the fourth chunk:
** (4) NO DIRTY EVENT

 - and a third flush to disk: 
**  cpd_for_io: index 1423
**  cleaning index 1423 at b7d1f000

 - here we unmap and flush:
**  Unmapped index 1423 at b7d1f000
**  Removing index 1423 from page cache

 - here we remap to check:
**  do_no_page: mapping index 1423 at b7d1f000 (read)
**  Unmapped index 1423 at b7d1f000

 - and finally, here I remove the file after the run:
**  Removing index 1423 from page cache

Now, the important thing to see here is:

 - the missing write did not have a "Setting page 1423 dirty" event 
   associated with it.

 - but I can _see_ where the actual dirty event would be happening in the 
   logs, because I can see the dirty events of the other chunk writes 
   around it, so I know exactly where that fourth write happens. And 
   indeed, it _shouldn't_ get a dirty event, because the page is still 
   dirty from the write of chunk #3 to that page, which _did_ get a dirty 
   event.

   I can see that, because the testing app writes the log of the pages it 
   writes, and this is the log around the fourth and final write:

...
Writing chunk 5338/15800 (60%) (0076eb48)   PFN: 76e/76f
Writing chunk 960/15800 (60%) (00156300)PFN: 156
Writing chunk 14465/15800 (60%) (01423fb4)  <
Writing chunk 8594/15800 (60%) (00bf74a8)   PFN: bf7
Writing chunk 556/15800 (60%) (000c62f0)PFN: c6
Writing chunk 15190/15800 (60%) (01526678)  PFN: 1526
...

   and I can match this up with the full log from the kernel, which looks 
   like this:

Setting page 076e dirty
Setting page 076f dirty
Setting page 0156 dirty
Setting page 00c6 dirty
Setting page 1526 dirty

   so I know exactly where the missing writes (to our page at pfn 1423, 
   and the fpn-bf7 page) happened.

 - and the thing is, I can see a "cpd_for_io()" happening AFTER that 
   fourth write. Quite a long while after, in fact. So all of this looks 
   very fine indeed. We are not losing any dirty bits.

 - EVEN MORE INTERESTING: write 3 makes it onto disk, and it really uses 
   the SAME dirty bit as write 4 did (which didn't make it out to disk!). 
   The event that clears the dirty bit that write 3 did happens AFTER 
   write 4 has happened!

So if we're not losing any dirty bits, what's going on?

I think we have some nasty interaction with the buffer heads. In 
particular, I don't think it's the dirty page bits that are broken (I 
_see_ that the PageDirty bit was set after write 4 was done to memory in 
the kernel traces). So I think that a real writeback just doesn't happen, 
because somebody has marked the buffer heads clean _after_ it started IO 
on them.

I think "__mpage_writepage()" is buggy in this regard, for example. It 
even has a comment about its crapola behaviour:

/*
 * Must try to add the page before marking the buffer clean or
 * the confused fail path above (OOM) will be very confused when
 * it finds all bh marked clean (i.e. it will not write anything)
 */

however, I don't think that particular thing explains it, because I don't 
think we use that function for the cases I'm looking at

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Russell King

On Thu, Dec 28, 2006 at 01:24:30PM -0800, Linus Torvalds wrote:
> On Thu, 28 Dec 2006, Linus Torvalds wrote:
> > 
> > What we need now is actually looking at the source code, and people who 
> > understand the VM, I'm afraid. I'm gathering traces now that I have a good 
> > test-case. I'll post my trace tools once I've tested that they work, in 
> > case others want to help.
> 
> Ok, I've got the traces, but quite frankly, I doubt anybody is crazy 
> enough to want to trawl through them. It's a bit painful, since we're 
> talking thousands of pages to trigger this problem.
> 
> Also, I've used the PG_arch_1 flag, which is fine on x86[-64] and probably 
> ARM, but is used for other things on ia64, powerpc and sparc64. But here's 
> the patch in case anybody cares.

PG_arch_1 is used on ARM to flag pages that need a dcache flush prior to
hitting userspace, in the same way that sparc64 uses it.  So ARM systems
should not have this patch applied.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2006, Linus Torvalds wrote:
> 
> What we need now is actually looking at the source code, and people who 
> understand the VM, I'm afraid. I'm gathering traces now that I have a good 
> test-case. I'll post my trace tools once I've tested that they work, in 
> case others want to help.

Ok, I've got the traces, but quite frankly, I doubt anybody is crazy 
enough to want to trawl through them. It's a bit painful, since we're 
talking thousands of pages to trigger this problem.

Also, I've used the PG_arch_1 flag, which is fine on x86[-64] and probably 
ARM, but is used for other things on ia64, powerpc and sparc64. But here's 
the patch in case anybody cares.

It wants a _big_ kernel buffer to capture all the crud into (which is why 
I made the thing accept a bigger log buffer), and quite frankly, I'm not 
at all sure that all the locking is ok (ie I could imagine that the 
dcache-locking thing there in "is_interesting()" could deadlock, what do I 
know..)

But I've captured some real data with this, which I'll describe 
separately.

Linus


diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 350878a..967dd80 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -91,6 +91,8 @@
 #define PG_nosave_free 18  /* Used for system suspend/resume */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
+#define SetPageInteresting(page) set_bit(PG_arch_1, &(page)->flags)
+#define PageInteresting(page)  test_bit(PG_arch_1, &(page)->flags)
 
 #if (BITS_PER_LONG > 32)
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5c26818..7735b83 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -79,7 +79,7 @@ config DEBUG_KERNEL
 
 config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)" if DEBUG_KERNEL
-   range 12 21
+   range 12 24
default 17 if S390 || LOCKDEP
default 16 if X86_NUMAQ || IA64
default 15 if SMP
diff --git a/mm/filemap.c b/mm/filemap.c
index 8332c77..d6a0f56 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -116,6 +116,7 @@ void __remove_from_page_cache(struct page *page)
 {
struct address_space *mapping = page->mapping;
 
+if (PageInteresting(page)) printk("Removing index %08x from page cache\n", 
page->index);
radix_tree_delete(&mapping->page_tree, page->index);
page->mapping = NULL;
mapping->nrpages--;
@@ -421,6 +422,23 @@ int filemap_write_and_wait_range(struct address_space 
*mapping,
return err;
 }
 
+static noinline int is_interesting(struct address_space *mapping)
+{
+   struct inode *inode = mapping->host;
+   struct dentry *dentry;
+   int retval = 0;
+
+   spin_lock(&dcache_lock);
+   list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
+   if (strcmp(dentry->d_name.name, "mapfile"))
+   continue;
+   retval = 1;
+   break;
+   }
+   spin_unlock(&dcache_lock);
+   return retval;
+}
+
 /**
  * add_to_page_cache - add newly allocated pagecache pages
  * @page:  page to add
@@ -439,6 +457,9 @@ int add_to_page_cache(struct page *page, struct 
address_space *mapping,
 {
int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
 
+   if (is_interesting(mapping))
+   SetPageInteresting(page);
+
if (error == 0) {
write_lock_irq(&mapping->tree_lock);
error = radix_tree_insert(&mapping->page_tree, offset, page);
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..14c9815 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -667,6 +667,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
tlb_remove_tlb_entry(tlb, pte, addr);
if (unlikely(!page))
continue;
+if (PageInteresting(page))
+   printk("Unmapped index %08x at %08x\n", page->index, addr);
if (unlikely(details) && details->nonlinear_vma
&& linear_page_index(details->nonlinear_vma,
addr) != page->index)
@@ -1605,6 +1607,7 @@ gotten:
 */
ptep_clear_flush(vma, address, page_table);
set_pte_at(mm, address, page_table, entry);
+if (PageInteresting(new_page)) printk("do_wp_page: mapping index %08x at 
%08lx\n", new_page->index, address);
update_mmu_cache(vma, address, entry);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
@@ -2249,6 +2252,7 @@ retry:
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+if (PageInteresting(new_page)) printk("do_no_page: mapping index %08x at %08lx 
(%s)\n", new_page->index, address, write_access ? "write" : "read");

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Arjan van de Ven

On Thu, 2006-12-28 at 14:39 -0500, Dave Jones wrote:
> On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote:
>  > 
>  > 
>  > On Thu, 28 Dec 2006, Petri Kaukasoina wrote:
>  > > > me up), and that seems to show the corruption going way way back (ie 
> going 
>  > > > back to Linux-2.6.5 at least, according to one tester).
>  > > 
>  > > That was a Fedora kernel. Has anyone seen the corruption in vanilla 
> 2.6.18
>  > > (or older)?
>  > 
>  > Well, that was a really _old_ fedora kernel. I guarantee you it didn't 
>  > have the page throttling patches in it, those were written this summer. So 
>  > it would either have to be Fedora carrying around another patch that just 
>  > happens to result in the same corruption for _years_, or it's the same 
>  > bug.
> 
> The only notable VM patch in Fedora kernels of that vintage that I recall
> was Ingo's 4g/4g thing.

which does tlb flushes *all the time* so that even rules out (well
almost) a stale tlb somewhere...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Dave Jones

On Thu, Dec 28, 2006 at 11:21:21AM -0800, Linus Torvalds wrote:
 > 
 > 
 > On Thu, 28 Dec 2006, Petri Kaukasoina wrote:
 > > > me up), and that seems to show the corruption going way way back (ie 
 > > > going 
 > > > back to Linux-2.6.5 at least, according to one tester).
 > > 
 > > That was a Fedora kernel. Has anyone seen the corruption in vanilla 2.6.18
 > > (or older)?
 > 
 > Well, that was a really _old_ fedora kernel. I guarantee you it didn't 
 > have the page throttling patches in it, those were written this summer. So 
 > it would either have to be Fedora carrying around another patch that just 
 > happens to result in the same corruption for _years_, or it's the same 
 > bug.

The only notable VM patch in Fedora kernels of that vintage that I recall
was Ingo's 4g/4g thing.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds

On Thu, 28 Dec 2006, Petri Kaukasoina wrote:
> > me up), and that seems to show the corruption going way way back (ie going 
> > back to Linux-2.6.5 at least, according to one tester).
> 
> That was a Fedora kernel. Has anyone seen the corruption in vanilla 2.6.18
> (or older)?

Well, that was a really _old_ fedora kernel. I guarantee you it didn't 
have the page throttling patches in it, those were written this summer. So 
it would either have to be Fedora carrying around another patch that just 
happens to result in the same corruption for _years_, or it's the same 
bug.

I bet it's the same bug, and it's been around for ages.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Petri Kaukasoina

On Thu, Dec 28, 2006 at 11:00:46AM -0800, Linus Torvalds wrote:
> And I have a test-program that shows the corruption _much_ easier (at 
> least according to my own testing, and that of several reporters that back 
> me up), and that seems to show the corruption going way way back (ie going 
> back to Linux-2.6.5 at least, according to one tester).

That was a Fedora kernel. Has anyone seen the corruption in vanilla 2.6.18
(or older)?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Linus Torvalds

On Thu, 28 Dec 2006, Marc Haber wrote:
> 
> After being up for ten days, I have now encountered the file
> corruption of pkgcache.bin for the first time again. The 256 MB i386
> box is like 26M in swap, is under very moderate load.
> 
> I am running plain vanilla 2.6.19.1. Is there a patch that I should
> apply against 2.6.19.1 that would help in debugging?

Not right now. 

And I have a test-program that shows the corruption _much_ easier (at 
least according to my own testing, and that of several reporters that back 
me up), and that seems to show the corruption going way way back (ie going 
back to Linux-2.6.5 at least, according to one tester).

So it just got a lot _easier_ to trigger in 2.6.19, but it's not a new 
bug.

What we need now is actually looking at the source code, and people who 
understand the VM, I'm afraid. I'm gathering traces now that I have a good 
test-case. I'll post my trace tools once I've tested that they work, in 
case others want to help.

(And hey, you don't have to be a VM expert to help: this could be a 
learning experience. However, I'll warn you: this is _the_ most grotty 
part of the whole kernel. It's not even ugly, it's just damn hard and 
complex).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-28 Thread Marc Haber

On Tue, Dec 19, 2006 at 09:51:49AM +0100, Marc Haber wrote:
> On Sun, Dec 17, 2006 at 09:43:08PM -0800, Andrew Morton wrote:
> > Six hours here of fsx-linux plus high memory pressure on SMP on 1k
> > blocksize ext3, mainline.  Zero failures.  It's unlikely that this testing
> > would pass, yet people running normal workloads are able to easily trigger
> > failures.  I suspect we're looking in the wrong place.
> 
> I do not have a clue about memory management at all, but is it
> possible that you're testing on a box with too much memory? My box has
> only 256 MB, and I used to use mutt with a _huge_ inbox with mutt
> taking somewhat 150 MB. Add spamassassin and a reasonably busy mail
> server, and the box used to be like 150 MB in swap.
> 
> I have tidied my inbox in the mean time and mutt's memory requirement
> has been reduced to somewhat 30 MB, which might be the cause that I
> don't see the issue that often any more.

After being up for ten days, I have now encountered the file
corruption of pkgcache.bin for the first time again. The 256 MB i386
box is like 26M in swap, is under very moderate load.

I am running plain vanilla 2.6.19.1. Is there a patch that I should
apply against 2.6.19.1 that would help in debugging?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 621 72739835
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds

On Thu, 28 Dec 2006, Martin Schwidefsky wrote:
> 
> For s390 there are two aspects to consider:
> 1) the pte values are 100% software controlled.

That's fine. In that situation, you shouldn't need any atomic ops at all, 
I think all our sw page-table operations are already done under the pte 
lock. 

The reason x86 needs to be careful is exactly the fact that the hardware 
will obviously do a lot on its own, and the hardware is _not_ going to 
honor our page table locking ;)

In an all-sw situation, a lot of this should be easier. S390 has _other_ 
things that are inconvenient (the strange "dirty bit is not in the page 
tables" thing that makes it look different from everybody else), but hey, 
it's a balance..

So for s390, ptep_exchange() in my example should be able to be a simple 
"load old value and store new one", assuming everybody honors the pte lock 
(and they _should_).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky

On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote:
> What do you guys think? Does something like this work out for S/390 too? I
> tried to make that "ptep_flush_dirty()" concept work for architectures
> that hide the dirty bit somewhere else too, but..

For s390 there are two aspects to consider:
1) the pte values are 100% software controlled. They only change because
a cpu stored a value to it or issued one of the specialized instructions
(csp, ipte and idte). The ptep_flush_dirty would be a nop for s390.
2) ptep_exchange is a bit dangerous. For s390 we need a lock that
protects the software controlled updates of the ptes. The reason is the
ipte instruction. It is implemented by the machine microcode in a
non-atomic way in regard to the memory. It reads the byte of the pte
that contains the invalid bit, flushes the tlb entries for it and then
writes back the byte with the invalid bit set. The microcode makes sure
that this pte cannot be used for form a new tlb on any cpu while the
ipte is in progress.
That means a compare-and-swap semantics on ptes won't work together with
the ipte optimization. As long as there is the pte lock that protects
all software accesses to the pte we are fine. But if any code expects
that ptep_exchange does something like an xchg things break.

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell


On 12/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

I do get this error on reiserfs ( old one, didn't try on reiser4 ).
Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the
debian bts.


I've had reports of corrupted data on earlier kernel releases with
reiserfs3, which were fixed by upgrading to reiserfs4.

Jari Sundell
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn

On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote:
> What would also actually be interesting is whether somebody can reproduce 
> this on Reiserfs, for example. I _think_ all the reports I've seen are on 
> ext2 or ext3, and if this is somehow writeback-related, it could be some 
> bug that is just shared between the two by virtue of them still having a 
> lot of stuff in common. 
> 
>   Linus
I do get this error on reiserfs ( old one, didn't try on reiser4 ). 
Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the
debian bts.

flo attenberger

---
Linux master 2.6.19 #1 PREEMPT Thu Dec 21 10:55:34 CET 2006 x86_64
GNU/Linux

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.19
# Thu Dec 21 10:45:05 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
CONFIG_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_X86_MCE_AMD=y
CONFIG_KEXEC=y
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x20
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_REORDER=y
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y

#
# Power management options
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SYSFS_DEPRECATED=y
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
CONFIG_ACPI_AC=m
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_HOTKEY=m
CONFIG_ACPI_FAN=m
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=m
CONFIG_ACPI_THERMAL=m
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_P

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell


On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


 - It never uses mprotect on the shared mappings, but it _does_ do:
"mincore()" - but the return values don't much matter (it's used
  as a heuristic on which parts to hash, apparently)

  I double- and triple-checked this one, because I
  did make changes to "mincore()", but those didn't go
  into the affected kernels anyway (ie they are not in
  plain 2.6.19, nor in 2.6.18.3 either)


Correct, mincore is only used to check if it should delay the hash checking.


"madvise(MADV_WILLNEED)"
"msync(MS_ASYNC)" (or MS_SYNC if you use a command line flag)
"munmap()" of course

 - it never seems to mix mmap() and write() - it does _only_ mmap.

 - it seems to mmap/munmap the shared files in nice 64-page chunks, all
   64-page aligned in the file (ie it does NOT create one big mapping, it
   has some kind of LRU of thse 64-page chunks). The only exception being
   the last chunk, which it maps byte-accurate to the size.


The length of the chunks is only page aligned on single file torrents,
not so on multi-file torrents. I've attached a patch for rtorrent that
will extend the length to the page boundary.


 - I haven't checked whether it only ever has the same chunk mapped once
   at a time.


This should be the case, but two mapped chunks may share a page,
sometimes with different r/w permissions.

Jari Sundell


extend_mapping.diff
Description: Binary data

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds

On Tue, 26 Dec 2006, Nick Piggin wrote:

> Linus Torvalds wrote:
> > 
> > Ok, so how about this diff.
> > 
> > I'm actually feeling good about this one. It really looks like
> > "do_no_page()" was simply buggy, and that this explains everything.
> 
> Still trying to catch up here, so I'm not going to reply to any old
> stuff and just start at the tip of the thread... Other than to say
> that I really like cancel_page_dirty ;)

Yeah, I think that part is a bit clearer about what's going on now.

> I think your patch is quite right so that's a good catch.

Actually, since people told me it didn't matter, I went back and looked at 
_why_ - the thing is, "vma->vm_page_prot" should always be read-only 
anyway, except for mappings that don't do dirty accounting at all, so I 
think my patch only found cases that are unimportant (ie pages that get 
faulted on on filesystems like ramfs that doesn't do any dirty page 
accounting because they're all dirty anyway).

> But I'm not too surprised that it does not help the problem, because I 
> don't think we have started shedding any old pte_dirty tests at 
> unmap/reclaim-time, have we? So the dirty bit isn't going to get lost, 
> as such.

True. We should no longer _need_ those dirty bit reclaims at 
unmap/reclaim, but we still do them, so you're right, even if we were 
buggy in this area, it should only really matter for the dirty page 
counting, not for any lost data.

> I was hoping that you've almost narrowed it down to the filesystem
> writeback code, with the last few mails?

I think so, yes.

However, I've checked, and "rtorrent" really does seem to be fairly 
well-behaved wrt any filesystem activity. It does

 - no threading. It's 100% single-threaded, and doesn't even appear to use 
   signals.

 - exactly _one_ "ftruncate()", and it does it at the beginning, for the 
   full final size.

   IOW, it's not anything subtle with truncate and dirty page cancel.

 - It never uses mprotect on the shared mappings, but it _does_ do:
"mincore()" - but the return values don't much matter (it's used 
  as a heuristic on which parts to hash, apparently)

  I double- and triple-checked this one, because I
  did make changes to "mincore()", but those didn't go 
  into the affected kernels anyway (ie they are not in 
  plain 2.6.19, nor in 2.6.18.3 either)

"madvise(MADV_WILLNEED)"
"msync(MS_ASYNC)" (or MS_SYNC if you use a command line flag)
"munmap()" of course

 - it never seems to mix mmap() and write() - it does _only_ mmap.

 - it seems to mmap/munmap the shared files in nice 64-page chunks, all 
   64-page aligned in the file (ie it does NOT create one big mapping, it 
   has some kind of LRU of thse 64-page chunks). The only exception being 
   the last chunk, which it maps byte-accurate to the size.

 - I haven't checked whether it only ever has the same chunk mapped once 
   at a time.

Anyway, the _one_ half-way interesting thing is the fact that it doesn't 
allocate any backing store at all for the file, and as such the page 
writeback needs to create all the underlying buffers on the filesystem. I 
really don't see why that would be a problem either, but I could imagine 
that if we have some writeback bug where we can end up writing back the 
_same_ page concurrently, we'd actually end up racing in the kernel, and 
allocating two different backing stores, and then maybe the other one 
would effectively "get lost" (and the earlier writeback would win the 
race, explaining why we'd end up with zeroes at the end of a block).

Or something.

However, all the codepaths _seem_ to test for PG_writeback, and not even 
try to start another writeback while the first one is still active.

What would also actually be interesting is whether somebody can reproduce 
this on Reiserfs, for example. I _think_ all the reports I've seen are on 
ext2 or ext3, and if this is somehow writeback-related, it could be some 
bug that is just shared between the two by virtue of them still having a 
lot of stuff in common. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro

On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote:
> On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote:
> > 
> > 
> > On Sun, 24 Dec 2006, Andrei Popa wrote:
> > > 
> > > Hash check on download completion found bad chunks, consider using
> > > "safe_sync".
> > 
> > Dang. Did you get any warning messages from the kernel?
> > 
> > Linus
> 
> BTW, rmap.c patch is broken - needs at least

... but that doesn't affect most of the architectures - only sparc64 and
some of powerpc.  So it's definitely not enough.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro

On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 24 Dec 2006, Andrei Popa wrote:
> > 
> > Hash check on download completion found bad chunks, consider using
> > "safe_sync".
> 
> Dang. Did you get any warning messages from the kernel?
> 
>   Linus

BTW, rmap.c patch is broken - needs at least

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/mm/rmap.c b/mm/rmap.c
index 57306fa..669acb2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -452,7 +452,7 @@ static int page_mkclean_one(struct page 
entry = ptep_clear_flush(vma, address, pte);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
-   set_pte_at(vma, address, pte, entry);
+   set_pte_at(mm, address, pte, entry);
lazy_mmu_prot_update(entry);
ret = 1;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin


Linus Torvalds wrote:


On Sun, 24 Dec 2006, Linus Torvalds wrote:

Peter, tell me I'm crazy, but with the new rules, the following condition 
is a bug:


- shared mapping
- writable
- not already marked dirty in the PTE



Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like 
"do_no_page()" was simply buggy, and that this explains everything.


Still trying to catch up here, so I'm not going to reply to any old
stuff and just start at the tip of the thread... Other than to say
that I really like cancel_page_dirty ;)

I think your patch is quite right so that's a good catch. But I'm not
too surprised that it does not help the problem, because I don't
think we have started shedding any old pte_dirty tests at
unmap/reclaim-time, have we? So the dirty bit isn't going to get lost,
as such.

I was hoping that you've almost narrowed it down to the filesystem
writeback code, with the last few mails?

Nick

Please please please test. Throw all the other patches away (with the 
possible exception of the "update_mmu_cache()" sanity checker, which is 
still interesting in case some _other_ place does this too).


Don't do the "wait_on_page_writeback()" thing, because it changes timings 
and might hide thngs for the wrong reasons.  Just apply this on top of a 
known failing kernel, and test.


Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);




--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr

* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-24 11:35]:
> And if this doesn't fix it, I don't know what will..

Sorry, but it still fails (on top of plain 2.6.19).
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Michael S. Tsirkin

> Quoting Linus Torvalds <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content 
> corruption on ext3)
>
> Peter, tell me I'm crazy, but with the new rules, the following condition 
> is a bug:
> 
>  - shared mapping
>  - writable
>  - not already marked dirty in the PTE
> 
> because that combination means that the hardware can mark the PTE dirty 
> without us even realizing (and thus not marking the "struct page *" 
> dirty).

Er.
Sorry about bumping in, and I'm not sure I understand all of the discussion,
but this reminded me of an old issue with COW that created what looks
like a vaguely similiar data corruption on infiniband. We solved this for
infiniband with MADV_DONTFORK, but I always wondered why does it not affect
other parts of kernel.  Small reminder from that discussion:

down mmap sem
get user pages
up mmap sem
page becomes shared, and COW (e.g. fork)
process writes to first byte of page <- gets a copy
Now we had a problem: struct page that we got from get user pages
does not point to a correct page in our process.
For example: if at some point we map this page for DMA, and
hardware writes to last byte of page -> process does not
see this data.

So for infiniband, what we do is a combination of
- prevent page from becoming COW while hardware might DMA to this page, and
- ask users not to write to page if hardware might DMA to same page
  (even if its using different bytes).

I just wandered - is there some chance something like this could be happening in
the fs code?

HTH,

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like
"do_no_page()" was simply buggy, and that this explains everything.


I tested with just this patch and 2.6.19 and no change. Sorry Linus,
no early Christmas present :-(

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote:
> 
> On Sun, 24 Dec 2006, Andrei Popa wrote:
> > 
> > Hash check on download completion found bad chunks, consider using
> > "safe_sync".
> 
> Dang. Did you get any warning messages from the kernel?
> 

only these:
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80

but I don't think has anything to do with...

>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Andrei Popa wrote:
> 
> Hash check on download completion found bad chunks, consider using
> "safe_sync".

Dang. Did you get any warning messages from the kernel?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote:
> 
> On Sun, 24 Dec 2006, Gordon Farquharson wrote:
> > 
> > The apt cache files (/var/cache/apt/*.bin) still get corrupted with
> > this patch and 2.6.19.
> 
> Yeah, if my guess about do_no_page() is right, _none_ of the previous 
> patches should have ANY effect what-so-ever. In fact, I'd say that even 
> the "ext3 works in writeback mode" thing that Andrei reports is probably a 
> total fluke brought on by timing changes rather than anything else.
> 
> So please try the latest patch instead (on top of anything that shows 
> corruption reliably - the patch should be _totally_ independent of all the 
> other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
> 2.6.19 too, so anything that shows corruption is a fine target - but try 
> to choose something that has been the "best" at corrupting things for you, 
> to make the testing as good as possible).
> 
> Patch included here again (although I think you were cc'd on my previous 
> email too, so you should already have it, and our emails just crossed)
> 
> And if this doesn't fix it, I don't know what will..

With latest git and patches:
http://lkml.org/lkml/diff/2006/12/24/56/1
http://lkml.org/lkml/diff/2006/12/24/61/1

Hash check on download completion found bad chunks, consider using
"safe_sync".

> 
>   Linus
> 
> ---
> diff --git a/mm/memory.c b/mm/memory.c
> index 563792f..cf429c4 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2247,21 +2249,23 @@ retry:
>   if (pte_none(*page_table)) {
>   flush_icache_page(vma, new_page);
>   entry = mk_pte(new_page, vma->vm_page_prot);
> - if (write_access)
> - entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> - set_pte_at(mm, address, page_table, entry);
>   if (anon) {
>   inc_mm_counter(mm, anon_rss);
>   lru_cache_add_active(new_page);
>   page_add_new_anon_rmap(new_page, vma, address);
> + if (write_access)
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>   } else {
>   inc_mm_counter(mm, file_rss);
>   page_add_file_rmap(new_page);
> + entry = pte_wrprotect(entry);
>   if (write_access) {
>   dirty_page = new_page;
>   get_page(dirty_page);
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>   }
>   }
> + set_pte_at(mm, address, page_table, entry);
>   } else {
>   /* One of our sibling threads was faster, back out. */
>   page_cache_release(new_page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Gordon Farquharson wrote:
> 
> The apt cache files (/var/cache/apt/*.bin) still get corrupted with
> this patch and 2.6.19.

Yeah, if my guess about do_no_page() is right, _none_ of the previous 
patches should have ANY effect what-so-ever. In fact, I'd say that even 
the "ext3 works in writeback mode" thing that Andrei reports is probably a 
total fluke brought on by timing changes rather than anything else.

So please try the latest patch instead (on top of anything that shows 
corruption reliably - the patch should be _totally_ independent of all the 
other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
2.6.19 too, so anything that shows corruption is a fine target - but try 
to choose something that has been the "best" at corrupting things for you, 
to make the testing as good as possible).

Patch included here again (although I think you were cc'd on my previous 
email too, so you should already have it, and our emails just crossed)

And if this doesn't fix it, I don't know what will..

Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


How about this particularly stupid diff? (please test with something that
_would_ cause corruption normally).

It is _entirely_ untested, but what it tries to do is to simply serialize
any writeback in progress with any process that tries to re-map a shared
page into its address space and dirty it. I haven't tested it, and maybe
it misses some case, but it looks likea good way to try to avoid races
with marking pages dirty and the writeback phase ..


The apt cache files (/var/cache/apt/*.bin) still get corrupted with
this patch and 2.6.19.

Gordon

diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c
--- linux-2.6.19.orig/fs/buffer.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700
@@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag
   int ret = 0;

   BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
   return 0;

   if (mapping == NULL) {  /* can this still happen? */
@@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag
   spin_lock(&mapping->private_lock);
   ret = drop_buffers(page, &buffers_to_free);
   spin_unlock(&mapping->private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*/
-   clear_page_dirty(page);
-   }
out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c
linux-2.6.19/fs/hugetlbfs/inode.c
--- linux-2.6.19.orig/fs/hugetlbfs/inode.c  2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/fs/hugetlbfs/inode.c   2006-12-21 01:15:21.0 -0700
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct

static void truncate_huge_page(struct page *page)
{
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
   ClearPageUptodate(page);
   remove_from_page_cache(page);
   put_page(page);
diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h
linux-2.6.19/include/linux/page-flags.h
--- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/include/linux/page-flags.h 2006-12-21
01:15:21.0 -0700
@@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc

struct page;   /* forward declaration */

-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);

-static inline void clear_page_dirty(struct page *page)
-{
-   test_clear_page_dirty(page);
-}
-
static inline void set_page_writeback(struct page *page)
{
   test_set_page_writeback(page);
diff -Naupr linux-2.6.19.orig/mm/memory.c linux-2.6.19/mm/memory.c
--- linux-2.6.19.orig/mm/memory.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/mm/memory.c2006-12-24 11:04:03.0 -0700
@@ -1534,6 +1534,7 @@ static int do_wp_page(struct mm_struct *
   if (!pte_same(*page_table, orig_pte))
   goto unlock;
   }
+   wait_on_page_writeback(old_page);
   dirty_page = old_page;
   get_page(dirty_page);
   reuse = 1;
@@ -1832,6 +1833,33 @@ void unmap_mapping_range(struct address_
}
EXPORT_SYMBOL(unmap_mapping_range);

+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+   pgoff_t index;
+   unsigned int offset;
+   struct page *page;
+
+   if (!mapping)
+   return;
+   offset = size & ~PAGE_MASK;
+   if (!offset)
+   return;
+   index = size >> PAGE_SHIFT;
+   page = find_lock_page(mapping, index);
+   if (page) {
+   unsigned int check = 0;
+   unsigned char *kaddr = kmap_atomic(page, KM_USER0);
+   do {
+   check += kaddr[offset++];
+   } while (offset < PAGE_SIZE);
+   kunmap_atomic(kaddr,KM_USER0);
+   unlock_page(page);
+   page_cache_release(page);
+   if (check)
+   printk("%s: BADNESS: truncate check %u\n",
current->comm, check);
+   }
+}
+
/**
 * vmtruncate - unmap mappings "freed" by truncate() syscall
 * @inode: inode of the file used
@@ -1865,6 +1893,7 @@ do_expand:
   goto ou

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Linus Torvalds wrote:
> 
> Peter, tell me I'm crazy, but with the new rules, the following condition 
> is a bug:
> 
>  - shared mapping
>  - writable
>  - not already marked dirty in the PTE

Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like 
"do_no_page()" was simply buggy, and that this explains everything.

Please please please test. Throw all the other patches away (with the 
possible exception of the "update_mmu_cache()" sanity checker, which is 
still interesting in case some _other_ place does this too).

Don't do the "wait_on_page_writeback()" thing, because it changes timings 
and might hide thngs for the wrong reasons.  Just apply this on top of a 
known failing kernel, and test.

Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Linus Torvalds wrote:
>
> How about this particularly stupid diff? (please test with something that 
> _would_ cause corruption normally).

Actually, here's an even more stupid diff, which actually to some degree 
seems to capture the real problem better.

Peter, tell me I'm crazy, but with the new rules, the following condition 
is a bug:

 - shared mapping
 - writable
 - not already marked dirty in the PTE

because that combination means that the hardware can mark the PTE dirty 
without us even realizing (and thus not marking the "struct page *" 
dirty).

(The above is actually a valid situation for IO mappings, but not for 
"real" mappings. And IO mappings should never take page faults, I think).

So, with that in mind, I wrote this stupid patch (for 32-bit x86, since I 
used my Mac Mini for testing ratehr than my main machine - but the x86-64 
version should be pretty much identcal)..

And you know what, Peter? It triggers for me. I get

WARNING at mm/memory.c:2274 do_no_page()
 [] show_trace_log_lvl+0x1a/0x2f
 [] show_trace+0x12/0x14
 [] dump_stack+0x16/0x18
 [] __handle_mm_fault+0x38d/0x919
 [] do_page_fault+0x1ff/0x507
 [] error_code+0x7c/0x84

which seems to say that do_no_page() can be used to insert shared and 
non-dirty, but still writable, pages.

But maybe my patch is just bogus, and I didn't think it through.

Peter, I realize it's Christmas Eve, but let's face it, Santa appreciates 
good boys and girls, and we all want tons of loot. So please be good, and 
waste some time looking at this and tell me why I'm either wrong, or 
there's a real smoking gun here.. ;)

Linus

---
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..1389bb7 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -494,7 +494,13 @@ do {   
\
  * The i386 doesn't have any external MMU info: the kernel page
  * tables contain all the necessary information.
  */
-#define update_mmu_cache(vma,address,pte) do { } while (0)
+#define bad_shared_pte(pte) (pte_write(pte) && !pte_dirty(pte))
+#define update_mmu_cache(vma,address,pte) do { \
+   static int __cnt;   \
+   WARN_ON(((vma)->vm_flags & VM_SHARED)   \
+&& bad_shared_pte(pte) \
+&& ++__cnt < 5);   \
+} while (0)
 #endif /* !__ASSEMBLY__ */

 #ifdef CONFIG_FLATMEM
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 09:16:06 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Sun, 24 Dec 2006, Andrei Popa wrote:
> 
> > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
> > > Andrei Popa <[EMAIL PROTECTED]> wrote:
> > > > /dev/sda7 on / type ext3 (rw,noatime,nobh)
> > > > 
> > > > I don't have corruption. I tested twice.
> > > 
> > > This is a surprising result.  Can you pleas retest ext3 
> > > data=writeback,nobh?
> > 
> > Yes, no corruption. Also tested only with data=writeback and had no
> > corruption.
> 
> Ok, so it would seem to be writeback related _somehow_. However, most of 
> the differences (I _thought_) in ext3 actually show up only if you have 
> *both* "nobh" and "data=writeback", and as far as I can tell, just a 
> simple "data=writeback" should still use the bog-standard 
> "block_write_full_page()".
> 
> Andrew?
> 
> Although as far as I can see, then ext2 should work as-is too (since it 
> too also just uses "block_write_full_page()" without anything fancy).

ext2 uses the multipage-bio assembly code for writeback whereas ext3
doesn't.  But ext3 doesn't use that code in data=ordered mode, of course.

Still, this:

--- a/fs/ext2/inode.c~a
+++ a/fs/ext2/inode.c
@@ -693,7 +693,7 @@ const struct address_space_operations ex
.commit_write   = generic_commit_write,
.bmap   = ext2_bmap,
.direct_IO  = ext2_direct_IO,
-   .writepages = ext2_writepages,
+// .writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
 };
 
@@ -711,7 +711,7 @@ const struct address_space_operations ex
.commit_write   = nobh_commit_write,
.bmap   = ext2_bmap,
.direct_IO  = ext2_direct_IO,
-   .writepages = ext2_writepages,
+// .writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
 };
 
_

will switch it off for ext2.


> Strange.
> 
> How about this particularly stupid diff? (please test with something that 
> _would_ cause corruption normally).
> 
> It is _entirely_ untested, but what it tries to do is to simply serialize 
> any writeback in progress with any process that tries to re-map a shared 
> page into its address space and dirty it. I haven't tested it, and maybe 
> it misses some case, but it looks likea good way to try to avoid races 
> with marking pages dirty and the writeback phase ..
> 
>   Linus
> ---
> diff --git a/mm/memory.c b/mm/memory.c
> index 563792f..64ed10b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct 
> vm_area_struct *vma,
>   if (!pte_same(*page_table, orig_pte))
>   goto unlock;
>   }
> + wait_on_page_writeback(old_page);
>   dirty_page = old_page;
>   get_page(dirty_page);
>   reuse = 1;
> @@ -2215,6 +2216,7 @@ retry:
>   page_cache_release(new_page);
>   return VM_FAULT_SIGBUS;
>   }
> + wait_on_page_writeback(new_page);
>   }
>   }

yup.  Also, we could perhaps lock the target page during pagefaults..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Andrei Popa wrote:

> On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
> > Andrei Popa <[EMAIL PROTECTED]> wrote:
> > > /dev/sda7 on / type ext3 (rw,noatime,nobh)
> > > 
> > > I don't have corruption. I tested twice.
> > 
> > This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?
> 
> Yes, no corruption. Also tested only with data=writeback and had no
> corruption.

Ok, so it would seem to be writeback related _somehow_. However, most of 
the differences (I _thought_) in ext3 actually show up only if you have 
*both* "nobh" and "data=writeback", and as far as I can tell, just a 
simple "data=writeback" should still use the bog-standard 
"block_write_full_page()".

Andrew?

Although as far as I can see, then ext2 should work as-is too (since it 
too also just uses "block_write_full_page()" without anything fancy).

Strange.

How about this particularly stupid diff? (please test with something that 
_would_ cause corruption normally).

It is _entirely_ untested, but what it tries to do is to simply serialize 
any writeback in progress with any process that tries to re-map a shared 
page into its address space and dirty it. I haven't tested it, and maybe 
it misses some case, but it looks likea good way to try to avoid races 
with marking pages dirty and the writeback phase ..

Linus
---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..64ed10b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
if (!pte_same(*page_table, orig_pte))
goto unlock;
}
+   wait_on_page_writeback(old_page);
dirty_page = old_page;
get_page(dirty_page);
reuse = 1;
@@ -2215,6 +2216,7 @@ retry:
page_cache_release(new_page);
return VM_FAULT_SIGBUS;
}
+   wait_on_page_writeback(new_page);
}
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
> On Sun, 24 Dec 2006 14:14:38 +0200
> Andrei Popa <[EMAIL PROTECTED]> wrote:
> 
> > > - mount the fs with ext2 with the no-buffer-head option.  That means 
> > > either:
> > > 
> > >   grub.conf:  rootfstype=ext2 rootflags=nobh
> > >   /etc/fstab: ext2 nobh
> > 
> > ierdnac ~ # mount
> > /dev/sda7 on / type ext2 (rw,noatime,nobh)
> > 
> > I have corruption.
> > 
> > > 
> > > - mount the fs with ext3 data=writeback, nobh
> > > 
> > >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > > works)
> > >   /etc/fstab: ext2 data=writeback,nobh
> > 
> > ierdnac ~ # mount
> > /dev/sda7 on / type ext3 (rw,noatime,nobh)
> > 
> > ierdnac ~ # dmesg|grep EXT3
> > EXT3-fs: mounted filesystem with writeback data mode.
> > EXT3 FS on sda7, internal journal
> > 
> > I don't have corruption. I tested twice.
> 
> This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?

Yes, no corruption. Also tested only with data=writeback and had no
corruption.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr

* Andrew Morton <[EMAIL PROTECTED]> [2006-12-24 00:57]:
>   /etc/fstab: ext2 nobh
>   /etc/fstab: ext3 data=writeback,nobh

It seems that busybox mount ignores the nobh option but both ext2 and
ext3 data=writeback work for me.  This is with plain 2.6.19 which
normally always fails.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 14:14:38 +0200
Andrei Popa <[EMAIL PROTECTED]> wrote:

> > - mount the fs with ext2 with the no-buffer-head option.  That means either:
> > 
> >   grub.conf:  rootfstype=ext2 rootflags=nobh
> >   /etc/fstab: ext2 nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext2 (rw,noatime,nobh)
> 
> I have corruption.
> 
> > 
> > - mount the fs with ext3 data=writeback, nobh
> > 
> >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > works)
> >   /etc/fstab: ext2 data=writeback,nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext3 (rw,noatime,nobh)
> 
> ierdnac ~ # dmesg|grep EXT3
> EXT3-fs: mounted filesystem with writeback data mode.
> EXT3 FS on sda7, internal journal
> 
> I don't have corruption. I tested twice.

This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 14:26:01 +0200
Andrei Popa <[EMAIL PROTECTED]> wrote:

> I also tested with ext3 ordered, nobh  and I have file corruption...

ordered+nobh isn't a possible combination.  The filesystem probably ignored
nobh.  nobh mode only makes sense with data=writeback.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote:
> On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
> > On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
> > Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > 
> > > I now _suspect_ that we're talking about something like
> > > 
> > >  - we started a writeout. The IO is still pending, and the page was 
> > >marked clean and is now in the "writeback" phase.
> > >  - a write happens to the page, and the page gets marked dirty again. 
> > >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> > >but they were actually already dirty, because the IO hasn't completed 
> > >yet.
> > >  - the IO from the _previous_ write completes, and marks the buffers 
> > > clean 
> > >again.
> > 
> > Some things for the testers to try, please:
> > 
> > - mount the fs with ext2 with the no-buffer-head option.  That means either:
> > 
> >   grub.conf:  rootfstype=ext2 rootflags=nobh
> >   /etc/fstab: ext2 nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext2 (rw,noatime,nobh)
> 
> I have corruption.
> 
> > 
> > - mount the fs with ext3 data=writeback, nobh
> > 
> >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > works)
> >   /etc/fstab: ext2 data=writeback,nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext3 (rw,noatime,nobh)
> 
> ierdnac ~ # dmesg|grep EXT3
> EXT3-fs: mounted filesystem with writeback data mode.
> EXT3 FS on sda7, internal journal
> 
> I don't have corruption. I tested twice.
> 

I also tested with ext3 ordered, nobh  and I have file corruption...

> > 
> > if that still fails we can rule out buffer_head funnies.
> > 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
> On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > I now _suspect_ that we're talking about something like
> > 
> >  - we started a writeout. The IO is still pending, and the page was 
> >marked clean and is now in the "writeback" phase.
> >  - a write happens to the page, and the page gets marked dirty again. 
> >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> >but they were actually already dirty, because the IO hasn't completed 
> >yet.
> >  - the IO from the _previous_ write completes, and marks the buffers clean 
> >again.
> 
> Some things for the testers to try, please:
> 
> - mount the fs with ext2 with the no-buffer-head option.  That means either:
> 
>   grub.conf:  rootfstype=ext2 rootflags=nobh
>   /etc/fstab: ext2 nobh

ierdnac ~ # mount
/dev/sda7 on / type ext2 (rw,noatime,nobh)

I have corruption.

> 
> - mount the fs with ext3 data=writeback, nobh
> 
>   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> works)
>   /etc/fstab: ext2 data=writeback,nobh

ierdnac ~ # mount
/dev/sda7 on / type ext3 (rw,noatime,nobh)

ierdnac ~ # dmesg|grep EXT3
EXT3-fs: mounted filesystem with writeback data mode.
EXT3 FS on sda7, internal journal

I don't have corruption. I tested twice.

> 
> if that still fails we can rule out buffer_head funnies.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Andrew Morton wrote:
> 
> > I now _suspect_ that we're talking about something like
> > 
> >  - we started a writeout. The IO is still pending, and the page was 
> >marked clean and is now in the "writeback" phase.
> >  - a write happens to the page, and the page gets marked dirty again. 
> >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> >but they were actually already dirty, because the IO hasn't completed 
> >yet.
> >  - the IO from the _previous_ write completes, and marks the buffers clean 
> >again.
> 
> Some things for the testers to try, please:
> 
> - mount the fs with ext2 with the no-buffer-head option.  That means either:

[ snip snip ]

This is definitely worth testing, but the exact schenario I outlined is 
probably not the thing that happens. It was really meant to be more of an 
exmple of the _kind_ of situation I think we might have.

That would explain why we didn't see this before: we simply didn't mark 
pages clean all that aggressively, and an app like rtorrent would normally 
have caused its flushes to happen _synchronously_ by using msync() (even 
if the IO itself was done asynchronously, all the dirty bit stuff would be 
synchronous wrt any rtorrent behaviour).

And the things that /did/ use to clean pages asynchronously (VM scanning) 
would always actually look at the "young" bit (aka "accessed") and not 
even touch the dirty bit if an application had accessed the page recently, 
so that basically avoided any likely races, because we'd touch the dirty 
bit ONLY if the page was "cold".

So this is why I'm saying that it might be an old bug, and it would be 
just the new pattern of handling dirty bits that triggers it.

But avoiding buffer heads and testing that part is worth doing. Just to 
remove one thing from the equation.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> I now _suspect_ that we're talking about something like
> 
>  - we started a writeout. The IO is still pending, and the page was 
>marked clean and is now in the "writeback" phase.
>  - a write happens to the page, and the page gets marked dirty again. 
>Marking the page dirty also marks all the _buffers_ in the page dirty, 
>but they were actually already dirty, because the IO hasn't completed 
>yet.
>  - the IO from the _previous_ write completes, and marks the buffers clean 
>again.

Some things for the testers to try, please:

- mount the fs with ext2 with the no-buffer-head option.  That means either:

  grub.conf:  rootfstype=ext2 rootflags=nobh
  /etc/fstab: ext2 nobh

- mount the fs with ext3 data=writeback, nobh

  grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this works)
  /etc/fstab: ext2 data=writeback,nobh

if that still fails we can rule out buffer_head funnies.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Gordon Farquharson wrote:
> 
> Is there any way to provide any debugging information that may help
> solve the problem ?

I think we have people working on this. I know I'm trying to even come up 
with an idea of what is going on. I don't think we know yet.

> Would it help to know the nature of the corruption e.g. an analysis
> of the corruption in the file ?

I actually think we know that, because Andrei already gave details. The 
corruption seems to be basically a few pages that get zeroes at the end 
rather than the expected contents. That's consistent with the page being 
written out once, but then _not_ getting written out again despite being 
dirtied some more.

But if you see ay other pattern, please holler, because that would be 
interesting.

> BTW, I decided to try Linus's test program [1] on ARM (I don't think
> that anybody had tried it on ARM before).

You get the expected results, and in fact, I'd be very surprised if you 
didn't. It's something subtler than that going on.

I now _suspect_ that we're talking about something like

 - we started a writeout. The IO is still pending, and the page was 
   marked clean and is now in the "writeback" phase.
 - a write happens to the page, and the page gets marked dirty again. 
   Marking the page dirty also marks all the _buffers_ in the page dirty, 
   but they were actually already dirty, because the IO hasn't completed 
   yet.
 - the IO from the _previous_ write completes, and marks the buffers clean 
   again.

And no, thatr's not actually what is going on. The thing is, we actually 
clear the buffer dirty bits when we start the IO, not when we end it, but 
I think it is going to be this _kind_ of situation, where we missed 
something, and marked it clean too late, and thus cleared a dirty bit.

I don't think it's a page table issue any more, it just doesn't look 
likely with the ARM UP corruption. It's also not apparently even on a 
cacheline boundary, so it probably is really a dirty bit that got cleared 
wrogn due to some race with IO.

But right now we're all clueless. I personally suspect it's not even a new 
bug: it's probably an old bug that simply didn't matter before.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote:


* Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]:
> >  and it failed.
> Since you are on ARM you might want to try with the page_mkclean_one
> cleanup patch too.

I've already tried it and it didn't work.  I just tried it again
together with Linus' patch and the two from Andrew and it still fails.
(For reference, the patch is attached.)


I can confirm this behaviour with 2.6.19 and the patches mentioned
above (cumulative patch for 2.6.19 appended to the end of this email).

Is there any way to provide any debugging information that may help
solve the problem ? Would it help to know the nature of the corruption
e.g. an analysis of the corruption in the file ? I have previously
asked apt developers if they wanted to look at the corrupted cache
files, but there were no takers then.

BTW, I decided to try Linus's test program [1] on ARM (I don't think
that anybody had tried it on ARM before).

Since we see file corruption with 2.6.18 + [PATCH] mm: tracking shared
dirty pages [2], I ran Linus's program on machines with the following
setups:

2.6.18 + the following patches
  mm: tracking shared dirty pages [2]
  mm: balance dirty pages [3]
  mm: optimize the new mprotect() code a bit [4]
  mm: small cleanup of install_page() [5]
  mm: fixup do_wp_page() [6]
  mm: msync() cleanup [7]

$ ./mm-test | od -x
000        
020        
040    
050

2.6.18 (no mm patches)

$ ./mm-test | od -x
000        
020        
040    
050

I don't know if this helps at all.

Gordon

[1] http://lkml.org/lkml/2006/12/19/200
[2] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
[3] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=edc79b2a46ed854595e40edcf3f8b37f9f14aa3f
[4] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c1e6098b23bb46e2b488fe9a26f831f867157483
[5] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e88dd6c11c5aef74d8b74a062767add53315533b
[6] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ee6a6457886a80415db209e87033b63f2b06558c
[7] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=204ec841fbea3e5138168edbc3a76d46747cc987

diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c
--- linux-2.6.19.orig/fs/buffer.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700
@@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag
   int ret = 0;

   BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
   return 0;

   if (mapping == NULL) {  /* can this still happen? */
@@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag
   spin_lock(&mapping->private_lock);
   ret = drop_buffers(page, &buffers_to_free);
   spin_unlock(&mapping->private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*/
-   clear_page_dirty(page);
-   }
out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c
linux-2.6.19/fs/hugetlbfs/inode.c
--- linux-2.6.19.orig/fs/hugetlbfs/inode.c  2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/fs/hugetlbfs/inode.c   2006-12-21 01:15:21.0 -0700
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct

 static void truncate_huge_page(struct page *page)
{
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
   ClearPageUptodate(page);
   remove_from_page_cache(page);
   put_page(page);
diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h
linux-2.6.19/include/linux/page-flags.h
--- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/include/linux/page-flags.h 2006-12-21
01:15:21.0 -0700
@@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc

 struct page;   /* forward declaration */

-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
 int test_clear

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa

On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote:
> * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]:
> > With all three patches I have corruption
> 
> I've completed one installation with Linus' patch plus the two from
> Andrew successfully, but I'm currently trying again... but I really
> need a better testcase since an installation takes about an hour.
> Andrei, which torrent do you download as a testcase?  It would be good
> if someone could suggest a torrent which is legal and not too large.
It's a 1.4GB file torrent split in 84 rar files and there are many
seeders. I download with ~ 5MB/sec. The torrent is private.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]:
> >  and it failed.
> Since you are on ARM you might want to try with the page_mkclean_one
> cleanup patch too.

I've already tried it and it didn't work.  I just tried it again
together with Linus' patch and the two from Andrew and it still fails.
(For reference, the patch is attached.)
-- 
Martin Michlmayr
http://www.cyrius.com/
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(&mapping->private_lock);
ret = drop_buffers(page, &buffers_to_free);
spin_unlock(&mapping->private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..4f4cd13 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4830a3b..350878a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -253,15 +253,11 @@ #define ClearPageUncached(page)   clear_bi
 
 struct page;   /* forward declaration */
 
-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
 int test_clear_page_writeback(struct page *page);
 int test_set_page_writeback(struct page *page);
 
-static inline void clear_page_dirty(struct page *page)
-{
-   test_clear_page_dirty(page);
-}
-
 static inline void set_page_writeback(struct page *page)
 {
test_set_page_writeback(page);
diff --git a/mm/memory.c b/mm/memory.c
index c00bac6..79cecab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+   pgoff_t index;
+   unsigned int offset;
+   struct page *page;
+
+   if (!mapping)
+   return;
+   offset = size & ~PAGE_MASK;
+   if (!offset)
+   return;
+   index = size >> PAGE_SHIFT;
+   page = find_lock_page(mapping, index);
+   if (page) {
+   unsigned int check = 0;
+   unsigned char *kaddr = kmap_atomic(page, KM_USER0);
+   do {
+   check += kaddr[offset++];
+   } while (offset < PAGE_SIZE);
+   kunmap_atomic(kaddr,KM_USER0);
+   unlock_page(page);
+   page_cache_release(page);
+   if (check)
+   printk("%s: BADNESS: truncate check %u\n", 
current->comm, check);
+   }
+}
+
 /**
  * vmtruncate - unmap mappings "freed" by truncate() syscall
  * @inode: inode of the file used
@@ -1875,6 +1902,7 @@ do_expand:
goto out_sig;
if (offset > inode->i_sb->s_maxbytes)
goto out_big;
+   check_last_page(mapping, inode->i_size);
i_size_write(inode, offset);
 
 out_truncate:
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 237107c..b3a198c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -845,38 +845,6 @@ int set_page_dirty_lock(struct page *pag
 EXPORT_SYMBOL(set_page_dirty_lock);
 
 /*
- * Clear a page's dirty flag, while caring for dirty memory accounting. 
- * Returns true if the page was previously dirty.
- */
-int test_clear_page_dirty(struct page *page)
-{
-   struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
-
-   if (!mapping)

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Linus Torvalds

On Fri, 22 Dec 2006, Peter Zijlstra wrote:
> 
> fix page_mkclean_one()
> 
>  - add flush_cache_page() for all those virtual indexed cache
>architectures.

I think the flush_cache_page() should be after we've actually flushed it 
from the TLB and re-inserted it (this is one reason why I did the 
"ptep_exchange()" version of this). Otherwise somebody can still write to 
the page _after_ the cache flush..

>  - handle s390.

Yeah, that looks like the proper way to handle that.

That said, it looks like we still see corruption. You may not, but Martin 
and Andrei still report problems, even with all the patches (including the 
last one from Andrew that avoids "dirty" going negative under some 
circumstances, and explains the "slow and/or never completed" case that 
Gordon and Martin saw).

The good news is that I think the code now is cleaner and more 
understandable. The bad news is that nothing we've ever tried seems to 
have fixed the _problem_.

And I don't think it's page_mkclean(). Especially not since the ARM people 
are seeing this under UP without PREEMPT. In that kind of schenario, the 
only possible races tend to be from things that actually block: 
"set_page_dirty()" (which blocks on IO in balancing), memory allocations, 
and obviously doing actual IO.

And it's not a virtual cache problem, since others see it on x86.

Of course, since it's quite possibly two different issues, maybe the 
virtual cache flush is required in order to force write-back to memory 
(which in turn is required for the DMA for the actual write!). So the ARM 
issue certainly could be due to the flush_cache_page() thing...

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-22 Thread Linus Torvalds

On Mon, 18 Dec 2006, Gene Heskett wrote:
>
> What about the mm/rmap.c one liner, in or out?

The one that just removes the "pte_mkclean()"? That's definitely out, it 
was just a test-patch to verify that the pte dirty bits seemed to matter 
at all (and they do).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-22 08:30]:
> Based on the kernel gurus current knowledge of the problem, would
> you expect the corruption to occur at the same point in a file, or
> is it possible that the corruption could occur at different points
> on successive Debian installer attempts on a UP, non PREEMPT system?

Seems like it can occur anywhere.  In fact, some people see apt
problems because of filesystem corruption on the NSLU2 after they have
already installe Debian.  I've only seen this once myself and failed
many times to find a reproducible situation.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-22 Thread Marc Haber

On Sat, Dec 16, 2006 at 06:43:10PM +, Martin Michlmayr wrote:
> * Marc Haber <[EMAIL PROTECTED]> [2006-12-09 10:26]:
> > Unfortunately, I am lacking the knowledge needed to do this in an
> > informed way. I am neither familiar enough with git nor do I possess
> > the necessary C powers.
> 
> I wonder if what you're seein is related to
> http://lkml.org/lkml/2006/12/16/73
> 
> You said that you don't see any corruption with 2.6.18.  Can you try
> to apply the patch from
> http://www2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
> to 2.6.18 to see if the corruption shows up?

Since I am no longer seeing the issue after easing the memory load, I
doubt that this would make sense.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 621 72739835
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-22 Thread Marc Haber

On Fri, Dec 22, 2006 at 08:30:06AM -0500, Daniel Drake wrote:
> Marc Haber wrote:
> >After updating to 2.6.19, Debian's apt control file
> >/var/cache/apt/pkgcache.bin corrupts pretty frequently - like in under
> >six hours. In that situation, "aptitude update" segfaults. When I
> >delete the file and have apt recreate it, things are fine again for a
> >few hours before the file is broken again and the segfault start over.
> >In all cases, umounting the file system and doing an fsck does not
> >show issues with the file system.
> 
> Are you using wireless networking of any kind?

Since the system in question is a colocated server box, I am pretty
sure that there is no wireless networking.

>  Might be useful if you could post 'dmesg' output so that people can
>  see the other hardware that you have.

I have attached what I could scrape from syslog.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 621 72739835
Dec 18 15:45:01 torres syslogd 1.4.1#17: restart.
Dec 18 15:45:01 torres kernel: klogd 1.4.1#17, log source = /proc/kmsg started.
Dec 18 15:45:01 torres kernel: Inspecting /boot/System.map-2.6.19.1-zgsrv
Dec 18 15:45:01 torres kernel: Loaded 26500 symbols from 
/boot/System.map-2.6.19.1-zgsrv.
Dec 18 15:45:01 torres kernel: Symbols match kernel version 2.6.19.
Dec 18 15:45:01 torres kernel: No module symbols loaded - kernel modules not 
enabled. 
Dec 18 15:45:01 torres kernel: Linux version 2.6.19.1-zgsrv ([EMAIL PROTECTED]) 
(gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 Sun Dec 17 
12:44:56 UTC 2006
Dec 18 15:45:01 torres kernel: BIOS-provided physical RAM map:
Dec 18 15:45:01 torres kernel:  BIOS-e820:  - 000a 
(usable)
Dec 18 15:45:01 torres kernel:  BIOS-e820: 000f - 0010 
(reserved)
Dec 18 15:45:01 torres kernel:  BIOS-e820: 0010 - 0f7f 
(usable)
Dec 18 15:45:01 torres kernel:  BIOS-e820: 0f7f - 0f7f3000 
(ACPI NVS)
Dec 18 15:45:01 torres kernel:  BIOS-e820: 0f7f3000 - 0f80 
(ACPI data)
Dec 18 15:45:01 torres kernel:  BIOS-e820:  - 0001 
(reserved)
Dec 18 15:45:01 torres kernel: 0MB HIGHMEM available.
Dec 18 15:45:01 torres kernel: 247MB LOWMEM available.
Dec 18 15:45:01 torres kernel: Entering add_active_range(0, 0, 63472) 0 entries 
of 256 used
Dec 18 15:45:01 torres kernel: Zone PFN ranges:
Dec 18 15:45:01 torres kernel:   DMA 0 -> 4096
Dec 18 15:45:01 torres kernel:   Normal   4096 ->63472
Dec 18 15:45:01 torres kernel:   HighMem 63472 ->63472
Dec 18 15:45:01 torres kernel: early_node_map[1] active PFN ranges
Dec 18 15:45:01 torres kernel: 0:0 ->63472
Dec 18 15:45:01 torres kernel: On node 0 totalpages: 63472
Dec 18 15:45:01 torres kernel:   DMA zone: 32 pages used for memmap
Dec 18 15:45:01 torres kernel:   DMA zone: 0 pages reserved
Dec 18 15:45:01 torres kernel:   DMA zone: 4064 pages, LIFO batch:0
Dec 18 15:45:01 torres kernel:   Normal zone: 463 pages used for memmap
Dec 18 15:45:01 torres kernel:   Normal zone: 58913 pages, LIFO batch:15
Dec 18 15:45:01 torres kernel:   HighMem zone: 0 pages used for memmap
Dec 18 15:45:01 torres kernel: DMI 2.2 present.
Dec 18 15:45:01 torres kernel: ACPI: RSDP (v000 VIA694  
  ) @ 0x000f8050
Dec 18 15:45:01 torres kernel: ACPI: RSDT (v001 VIA694 MSI ACPI 0x42302e31 AWRD 
0x) @ 0x0f7f3000
Dec 18 15:45:01 torres kernel: ACPI: FADT (v001 VIA694 MSI ACPI 0x42302e31 AWRD 
0x) @ 0x0f7f3040
Dec 18 15:45:01 torres kernel: ACPI: DSDT (v001 VIA694 AWRDACPI 0x1000 MSFT 
0x010c) @ 0x
Dec 18 15:45:01 torres kernel: ACPI: PM-Timer IO Port: 0x4008
Dec 18 15:45:01 torres kernel: Allocating PCI resources starting at 1000 
(gap: 0f80:f07f)
Dec 18 15:45:01 torres kernel: Detected 1466.361 MHz processor.
Dec 18 15:45:01 torres kernel: Built 1 zonelists.  Total pages: 62977
Dec 18 15:45:01 torres kernel: Kernel command line: root=/dev/hda1 ro 
vga=normal 
Dec 18 15:45:01 torres kernel: Enabling fast FPU save and restore... done.
Dec 18 15:45:01 torres kernel: Enabling unmasked SIMD FPU exception support... 
done.
Dec 18 15:45:01 torres kernel: Initializing CPU#0
Dec 18 15:45:01 torres kernel: PID hash table entries: 1024 (order: 10, 4096 
bytes)
Dec 18 15:45:01 torres kernel: Console: colour VGA+ 80x25
Dec 18 15:45:01 torres kernel: Dentry cache hash table entries: 32768 (order: 
5, 131072 bytes)
Dec 18 15:45:01 torres kernel: Inode-cache hash table entries: 16384 (order: 4, 
65536 bytes)
Dec 18 15:45:01 torres kernel: Memory: 246964k/253888k available (2896k kernel 
code, 6368k reserved, 859k data, 204k init, 0k highmem)
Dec 18 15:45:0

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson


On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote:


... and now that we've completed this step, the apt cache has suddenly
been reduced (see Gordon's mail for an explanation) and it segfaults:

sh-3.1# ls -l /var/cache/apt/
total 12524
drwxr-xr-x 3 root root   12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin
-rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin
sh-3.1# apt-get -f install
Reading package lists... Done
Segmentation faulty tree... 50%


I think that we are seeing different manifestations of apt's response
to corrupted cache files. There does not appear to be any pattern to
which manifestation occurs. Maybe it depends on where in the cache
file the corruption is located, i.e. when the corruption occurs. Based
on the kernel gurus current knowledge of the problem, would you expect
the corruption to occur at the same point in a file, or is it possible
that the corruption could occur at different points on successive
Debian installer attempts on a UP, non PREEMPT system ?

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Patrick Mau

On Fri, Dec 22, 2006 at 01:32:49PM +0100, Martin Michlmayr wrote:
> * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]:
> > With all three patches I have corruption
> 
> I've completed one installation with Linus' patch plus the two from
> Andrew successfully, but I'm currently trying again... but I really
> need a better testcase since an installation takes about an hour.
> Andrei, which torrent do you download as a testcase?  It would be good
> if someone could suggest a torrent which is legal and not too large.

Hi everyone,

I have been reading this thread for the last few days, but have been
silent. I have 3 torrents here for testing, if you want.

You can easily reproduce with "rtorrent", if you:

- Have a completly downloaded one, no matter what size
- Corrupt the download with
  dd if=/dev/zero of=download.file bs=16k count=1
- Restart 'rtorrent', hash-check fails
- It will download 1 piece that was corrupted.

The important part here is that rtorrent transfers one piece,
using its own code sequence to write to the file.

Let me offer to test until Saturday afternoon CET,
I have a cloned git repository, downloaded torrent files and "apt".

My systems that are affected are:

Linux oscar 2.6.18 SMP (2x450Mhz Intel P3)
(rolled back to 2.6.18 but can boot latest git)

Linux tony 2.6.20-git UP
(can be tested using all kinds of "apt" operations)

Both machines are using:
IDE  -> MD-RAID1 -> LVM -> EXT3 (data=ordered)
SCSI -> MD-RAID5 -> .

I don't want to disturb your technical discussion,
just offering some help in testing.

Regards,
Patrick

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson


On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote:


sh-3.1# ls -l /var/cache/apt/
total 5252
drwxr-xr-x 3 root root12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin
-rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin


This listing is a little different to what I got. For me,
srcpkgcache.bin did not exist when apt eventually finished. Did you
notice whether the install took a lot longer than usual ?


Gordon, does it fail for you where it normally does (installing
initramfs-tools) or much later?  For me, the installer was able to
install initramfs-tools and the kernel, but apt now hangs at "Select
and install software".


apt didn't hang for me, it just took 20 to 30 minutes to complete
building the package database. Usually, it takes less than a minute.
The installer stopped because it could not find a kernel to install. I
have seen this failure mde before, and as you have previously pointed
out, is probably the same problem (corrupted apt cache files), just a
different manifestation.

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson


On 12/21/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


Andrew located at least one bug: we run cancel_dirty_page() too late in
"truncate_complete_page()", which means that do_invalidatepage() ends up
not clearing the page cache.

His patch is appended.


Thanks. I'll try this out later today.


But it sounds like I probably misunderstood something, because I thought
that Martin had acknowledged that this patch actually worked for him.
Which sounded very similar to your setup (he has a 32M ARM box too, no?)


Yup, we have the same machines (Linksys NSLU2) and are running the
same test case (installing Debian). However, I'm not sure what kernel
version he had used for his latest test. I presumed 2.6.20-git,
whereas I had used 2.6.19.


Maybe it's mount option issue? I've got data=ordered on my machine, are
you perhaps runnign with something else?


We are also using ordered.

/dev/scsi/host0/bus0/target0/lun0/part1 /target ext3 rw,data=ordered 0 0

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra


A cleanup of try_to_unmap. I have not identified any races that this
would solve, but for consistencies sake.

Also includes a small s390 optimization by moving
page_test_and_clear_dirty() out of the vma iteration.


From: Peter Zijlstra <[EMAIL PROTECTED]>

We clear the page in the following sequence:
  ClearPageDirty - lock ptl, clear pte, unlock ptl

hence we should dirty in the opposite order:
  lock ptl, clear pte, unlock ptl - SetPageDirty

try_to_unmap_one violates this by doing the SetPageDirty under the ptl.

Also move page_test_and_clear_dirty() to try_to_unmap().

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/rmap.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -590,8 +590,6 @@ void page_remove_rmap(struct page *page)
 * Leaving it set also helps swapoff to reinstate ptes
 * faster for those pages still in swapcache.
 */
-   if (page_test_and_clear_dirty(page))
-   set_page_dirty(page);
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : 
NR_FILE_MAPPED);
}
@@ -610,6 +608,7 @@ static int try_to_unmap_one(struct page 
pte_t pteval;
spinlock_t *ptl;
int ret = SWAP_AGAIN;
+   struct page *dirty_page = NULL;
 
address = vma_address(page, vma);
if (address == -EFAULT)
@@ -636,7 +635,7 @@ static int try_to_unmap_one(struct page 
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
-   set_page_dirty(page);
+   dirty_page = page;
 
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
@@ -687,6 +686,8 @@ static int try_to_unmap_one(struct page 
 
 out_unmap:
pte_unmap_unlock(pte, ptl);
+   if (dirty_page)
+   set_page_dirty(dirty_page);
 out:
return ret;
 }
@@ -918,6 +919,9 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
+   if (page_test_and_clear_dirty(page))
+   set_page_dirty(page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra

On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote:
> * Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]:
> > I've completed one installation with Linus' patch plus the two from
> > Andrew successfully, but I'm currently trying again...
> 
>  and it failed.

Since you are on ARM you might want to try with the page_mkclean_one
cleanup patch too.

Arjan agreed that the loop is not needed; we clear the pte, flush on all
CPUs and then re-establish the pte. Any race will fault and be
serialised on the pte lock.

FWIW - with todays -git and Andrews second cancel_dirty_page() patch:
  http://lkml.org/lkml/2006/12/22/49
I am unable to trigger any corruption - I could again earlier by raising
the number of seeds from 3 to 6. (am currently at 10 seeds)




From: Peter Zijlstra <[EMAIL PROTECTED]>

fix page_mkclean_one()

 - add flush_cache_page() for all those virtual indexed cache
   architectures.

 - handle s390.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/rmap.c |   38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -432,7 +432,7 @@ static int page_mkclean_one(struct page 
 {
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
-   pte_t *pte, entry;
+   pte_t *pte;
spinlock_t *ptl;
int ret = 0;
 
@@ -444,17 +444,18 @@ static int page_mkclean_one(struct page 
if (!pte)
goto out;
 
-   if (!pte_dirty(*pte) && !pte_write(*pte))
-   goto unlock;
+   if (pte_dirty(*pte) || pte_write(*pte)) {
+   pte_t entry;
 
-   entry = ptep_get_and_clear(mm, address, pte);
-   entry = pte_mkclean(entry);
-   entry = pte_wrprotect(entry);
-   ptep_establish(vma, address, pte, entry);
-   lazy_mmu_prot_update(entry);
-   ret = 1;
+   flush_cache_page(vma, address, pte_pfn(*pte));
+   entry = ptep_clear_flush(vma, address, pte);
+   entry = pte_wrprotect(entry);
+   entry = pte_mkclean(entry);
+   set_pte_at(vma, address, pte, entry);
+   lazy_mmu_prot_update(entry);
+   ret = 1;
+   }
 
-unlock:
pte_unmap_unlock(pte, ptl);
 out:
return ret;
@@ -489,6 +490,8 @@ int page_mkclean(struct page *page)
if (mapping)
ret = page_mkclean_file(mapping, page);
}
+   if (page_test_and_clear_dirty(page))
+   ret = 1;
 
return ret;
 }


 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19 file content corruption on ext3

2006-12-22 Thread Daniel Drake


Marc Haber wrote:

After updating to 2.6.19, Debian's apt control file
/var/cache/apt/pkgcache.bin corrupts pretty frequently - like in under
six hours. In that situation, "aptitude update" segfaults. When I
delete the file and have apt recreate it, things are fine again for a
few hours before the file is broken again and the segfault start over.
In all cases, umounting the file system and doing an fsck does not
show issues with the file system.


Are you using wireless networking of any kind? If so which driver and 
security key system? Might be useful if you could post 'dmesg' output so 
that people can see the other hardware that you have.


Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]:
> I've completed one installation with Linus' patch plus the two from
> Andrew successfully, but I'm currently trying again...

... and it failed.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]:
> With all three patches I have corruption

I've completed one installation with Linus' patch plus the two from
Andrew successfully, but I'm currently trying again... but I really
need a better testcase since an installation takes about an hour.
Andrei, which torrent do you download as a testcase?  It would be good
if someone could suggest a torrent which is legal and not too large.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa

With all three patches I have corruption


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(&mapping->private_lock);
ret = drop_buffers(page, &buffers_to_free);
spin_unlock(&mapping->private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..4f4cd13 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/include/asm-generic/pgtable.h
b/include/asm-generic/pgtable.h
index 9d774d0..8879f1d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -61,31 +61,6 @@ ({   
\
 })
 #endif
 
-#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
-#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\
-({ \
-   pte_t __pte = *__ptep;  \
-   int r = 1;  \
-   if (!pte_dirty(__pte))  \
-   r = 0;  \
-   else\
-   set_pte_at((__vma)->vm_mm, (__address), (__ptep),   \
-  pte_mkclean(__pte)); \
-   r;  \
-})
-#endif
-
-#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(__vma, __address, __ptep)   \
-({ \
-   int __dirty;\
-   __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep);  \
-   if (__dirty)\
-   flush_tlb_page(__vma, __address);   \
-   __dirty;\
-})
-#endif
-
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 #define ptep_get_and_clear(__mm, __address, __ptep)\
 ({ \
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..b61d6f9 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -300,18 +300,20 @@ do {  
\
flush_tlb_page(vma, address);   \
 } while (0)
 
-#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(vma, address, ptep) \
-({ \
-   int __dirty;\
-   __dirty = pte_dirty(*(ptep));   \
-   if (__dirty) {  \
-   clear_bit(_PAGE_BIT_DIRTY, &(ptep)->pte_low);   \
-   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
-   flush_tlb_page(vma, address);   \
-   }   \
-   __dirty;

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Andrew Morton <[EMAIL PROTECTED]> [2006-12-22 02:17]:
> > This hunk (on top of git from about 2 days ago and your latest patch)
> > results in the installer hanging right at the start.
> 
> You'll need this also:

It starts again, thanks.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:10]:
> > immediately when I started wget, the hanging apt-get process
> > continued.
> ... and now that we've completed this step, the apt cache has suddenly
> been reduced (see Gordon's mail for an explanation) and it segfaults:

One of my questions was why apt-get worked to install the
initramfs-tools, the kernel and some other packages but later hung
while it was building the cache (which clearly it had built already to
install some packages): before the installer offers to install
additional packages, it changes the apt sources, which leads to apt
rebuilding the cache, and here it hangs.

Remember how I said that downloading a file with wget prompts apt to
work again?  Apparently any filesystem access will do (I just ran
find / > /dev/null).  Gordon, can you confirm this?
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrew Morton

On Fri, 22 Dec 2006 11:00:04 +0100
Martin Michlmayr <[EMAIL PROTECTED]> wrote:

> > -   if (TestClearPageDirty(page) && account_size)
> > +   if (TestClearPageDirty(page) && account_size) {
> > +   dec_zone_page_state(page, NR_FILE_DIRTY);
> > task_io_account_cancelled_write(account_size);
> > +   }
> 
> This hunk (on top of git from about 2 days ago and your latest patch)
> results in the installer hanging right at the start. 

You'll need this also:

From: Andrew Morton <[EMAIL PROTECTED]>

Only (un)account for IO and page-dirtying for devices which have real backing
store (ie: not tmpfs or ramdisks).

Cc: "David S. Miller" <[EMAIL PROTECTED]>
Cc: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 mm/truncate.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN mm/truncate.c~truncate-dirty-memory-accounting-fix mm/truncate.c
--- a/mm/truncate.c~truncate-dirty-memory-accounting-fix
+++ a/mm/truncate.c
@@ -60,7 +60,8 @@ void cancel_dirty_page(struct page *page
WARN_ON(++warncount < 5);
}

-   if (TestClearPageDirty(page) && account_size) {
+   if (TestClearPageDirty(page) && account_size &&
+   mapping_cap_account_dirty(page->mapping)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
task_io_account_cancelled_write(account_size);
}
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:06]:
> Okay, it's really weird.  So apt-get just hangs doing nothing and I
> cannot even kill it.  I just tried to download strace via wget and
> immediately when I started wget, the hanging apt-get process
> continued.

... and now that we've completed this step, the apt cache has suddenly
been reduced (see Gordon's mail for an explanation) and it segfaults:

sh-3.1# ls -l /var/cache/apt/
total 12524
drwxr-xr-x 3 root root   12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin
-rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin
sh-3.1# apt-get -f install
Reading package lists... Done
Segmentation faulty tree... 50%

-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:00]:
> This time, however, I let the installer continue and it seems that
> with your patch apt now works where it failed in the past, but it
> hangs later on.  It's pretty weird because I cannot even kill the
> process:

Okay, it's really weird.  So apt-get just hangs doing nothing and I
cannot even kill it.  I just tried to download strace via wget and
immediately when I started wget, the hanging apt-get process
continued.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-21 21:20]:
> generating these files, pkgcache.bin grows to 12582912 bytes, and when
> apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is
> 64254483 bytes. This time, when apt-get exited, it had only created
> pkgcache.bin which was still 12582912 bytes.

Yes, same here:

sh-3.1# ls -l /var/cache/apt/
total 5252
drwxr-xr-x 3 root root12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin
-rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin

Gordon, does it fail for you where it normally does (installing
initramfs-tools) or much later?  For me, the installer was able to
install initramfs-tools and the kernel, but apt now hangs at "Select
and install software".
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-21 20:54]:
> But it sounds like I probably misunderstood something, because I thought
> that Martin had acknowledged that this patch actually worked for him.

That's what I thought too but now I can confirm what Gordon sees.  But
it's pretty weird.  Our testcase is to run Debian installer on the
NSLU2 arm device and apt-get would either segfault or hang at this
particular spot in the installation (when apt is first run).  With
your patch, apt works correctly where it normally fails (at least for
me).  I stopped the installation at this point and repeated it several
more times to make sure it's really working.  And, yes, I can repeat
this result.

This time, however, I let the installer continue and it seems that
with your patch apt now works where it failed in the past, but it
hangs later on.  It's pretty weird because I cannot even kill the
process:

sh-3.1# ps aux | grep 31126
root 31126  5.7 20.6  16240  6076 ?R+   04:45   0:21 apt-get -o 
APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install 
popularity-contest
root 31157  0.0  1.6   1516   492 ttyS0S+   04:51   0:00 grep 31126
sh-3.1# kill -9 31126
sh-3.1# kill -9 31126
sh-3.1# ps aux | grep 31126
root 31126  5.6 20.6  16240  6076 ?R+   04:45   0:21 apt-get -o 
APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install 
popularity-contest
root 31159  0.0  1.6   1516   492 ttyS0S+   04:51   0:00 grep 31126
sh-3.1#

> Which sounded very similar to your setup (he has a 32M ARM box too, no?)

It's the same device, a Linksys NSLU2.

> Author: Andrew Morton <[EMAIL PROTECTED]>

This patch makes it even worse for me.

> - if (TestClearPageDirty(page) && account_size)
> + if (TestClearPageDirty(page) && account_size) {
> + dec_zone_page_state(page, NR_FILE_DIRTY);
>   task_io_account_cancelled_write(account_size);
> + }

This hunk (on top of git from about 2 days ago and your latest patch)
results in the installer hanging right at the start.  The Linux kernel
boots fine, the debian-installer is loaded into a ramdisk but when
ncurses is being started it just hangs.  Reverting this hunk makes it
start again.

Does that help or confuse you even more?
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds

On Thu, 21 Dec 2006, Gordon Farquharson wrote:
> 
> I tested 2.6.19 with a version of Linus's patch that applies cleanly
> to 2.6.19 (patch appended to the end of this email) on ARM and apt-get
> failed. It did not segfault this time, but instead got stuck for about
> 20 to 30 minutes and was accessing the hard drive frequently.

Ok, there's definitely something screwy going on.

Andrew located at least one bug: we run cancel_dirty_page() too late in 
"truncate_complete_page()", which means that do_invalidatepage() ends up 
not clearing the page cache.

His patch is appended.

But it sounds like I probably misunderstood something, because I thought 
that Martin had acknowledged that this patch actually worked for him. 
Which sounded very similar to your setup (he has a 32M ARM box too, no?)

And your failure sounds a lot like one that David Miller is reporting. At 
the same time, my own shared file mmap tests on my own machines obviously 
work fine (I lower the dirty-writeback tresholds to force writeback more 
easily, and then mmap a file and write and rewrite to it in memory, and 
truncate it).

Maybe it's mount option issue? I've got data=ordered on my machine, are 
you perhaps runnign with something else?

Linus

---
commit 3e67c0987d7567ad41164a153dca9a43b11d
Author: Andrew Morton <[EMAIL PROTECTED]>
Date:   Thu Dec 21 11:00:33 2006 -0800

[PATCH] truncate: clear page dirtiness before running try_to_free_buffers()

truncate presently invalidates the dirty page's buffer_heads then shoots 
down
the page.  But try_to_free_buffers() will now bale out because the page is
dirty.

Net effect: the LRU gets filled with dirty pages which have invalidated
buffer_heads attached.  They have no ->mapping and hence cannot be cleaned.
The machine leaks memory at an enormous rate.

Fix this by cleaning the page before running try_to_free_buffers(), so
try_to_free_buffers() can do its work.

Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the
machine won't wedge up trying to write non-existent dirty pages.

Probably still wrong, but now less so.

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>

diff --git a/mm/truncate.c b/mm/truncate.c
index bf9e296..89a5c35 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -60,11 +60,12 @@ void cancel_dirty_page(struct page *page, unsigned int 
account_size)
WARN_ON(++warncount < 5);
}

-   if (TestClearPageDirty(page) && account_size)
+   if (TestClearPageDirty(page) && account_size) {
+   dec_zone_page_state(page, NR_FILE_DIRTY);
task_io_account_cancelled_write(account_size);
+   }
 }

-
 /*
  * If truncate cannot remove the fs-private metadata from the page, the page
  * becomes anonymous.  It will be left on the LRU and may even be mapped into
@@ -81,11 +82,11 @@ truncate_complete_page(struct address_space *mapping, 
struct page *page)
if (page->mapping != mapping)
return;

+   cancel_dirty_page(page, PAGE_CACHE_SIZE);
+
if (PagePrivate(page))
do_invalidatepage(page, 0);

-   cancel_dirty_page(page, PAGE_CACHE_SIZE);
-
ClearPageUptodate(page);
ClearPageMappedToDisk(page);
remove_from_page_cache(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Gordon Farquharson

On 12/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote:

> Can the call to task_io_account_cancelled_write() simply be removed
> from cancel_dirty_page() for testing the patch with 2.6.19 (since
> 2.6.19 doesn't seem to have the task I/O accounting) ?

Yes.

I tested 2.6.19 with a version of Linus's patch that applies cleanly
to 2.6.19 (patch appended to the end of this email) on ARM and apt-get
failed. It did not segfault this time, but instead got stuck for about
20 to 30 minutes and was accessing the hard drive frequently.

Here is some background about the problem we see with apt which may
help somebody with knowledge of the apt source code analyse the
problem in the context of the patch. When apt-get is first run, it
generates pkgcache.bin and srcpkgcache.bin in /var/cache/apt. We have
found that these are the files that get corrupted when we apply the
patch "mm: tracking shared dirty pages" [1] to 2.6.18. The corruption
of these files is what causes apt-get to segfault. I have observed
that the normal operation of apt-get is that while apt-get is
generating these files, pkgcache.bin grows to 12582912 bytes, and when
apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is
64254483 bytes. This time, when apt-get exited, it had only created
pkgcache.bin which was still 12582912 bytes. Also, the patch caused
apt to slow down a lot. I ran apt-get -f install after apt had exited,
and it took so long that I killed it before it had finished.

I did not try 2.6.20-git, but I presume that this version is what
Martin tried earlier. Maybe Linus's patch doesn't work with 2.6.19,
because 2.6.19 is missing some other patch.

Gordon

[1] 
http://www2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89

diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c
--- linux-2.6.19.orig/fs/buffer.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700
@@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag
   int ret = 0;

   BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
   return 0;

   if (mapping == NULL) {  /* can this still happen? */
@@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag
   spin_lock(&mapping->private_lock);
   ret = drop_buffers(page, &buffers_to_free);
   spin_unlock(&mapping->private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*/
-   clear_page_dirty(page);
-   }
out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c
linux-2.6.19/fs/hugetlbfs/inode.c
--- linux-2.6.19.orig/fs/hugetlbfs/inode.c  2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/fs/hugetlbfs/inode.c   2006-12-21 01:15:21.0 -0700
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct

 static void truncate_huge_page(struct page *page)
{
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
   ClearPageUptodate(page);
   remove_from_page_cache(page);
   put_page(page);
diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h
linux-2.6.19/include/linux/page-flags.h
--- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/include/linux/page-flags.h 2006-12-21
01:15:21.0 -0700
@@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc

 struct page;   /* forward declaration */

-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
 int test_clear_page_writeback(struct page *page);
 int test_set_page_writeback(struct page *page);

-static inline void clear_page_dirty(struct page *page)
-{
-   test_clear_page_dirty(page);
-}
-
 static inline void set_page_writeback(struct page *page)
{
   test_set_page_writeback(page);
diff -Naupr linux-2.6.19.orig/mm/memory.c linux-2.6.19/mm/memory.c
--- linux-2.6.19.orig/mm/memory.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/mm/memory.c2006-12-21 01:15:21.0 -0700
@@ -1832,6 +1832,33 @@ void unmap_mapping_range(struct address_
}
EXPORT_SYMBOL(unmap_mapping_range);

+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+   pgoff_t index;
+   unsigned int offset;
+   struct page *page;
+
+   if

Re: 2.6.19 file content corruption on ext3

2006-12-21 Thread Andrew Morton

On Thu, 21 Dec 2006 14:03:20 +0100
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> On Tue, 2006-12-19 at 09:43 -0800, Linus Torvalds wrote:
> > 
> > Btw,
> >  here's a totally new tangent on this: it's possible that user code is 
> > simply BUGGY. 
> 
> depmod: BADNESS: written outside isize 22183

akpm:/usr/src/module-init-tools-3.3-pre1> grep -r mmap .
./zlibsupport.c:map = mmap(0, *size, PROT_READ|PROT_WRITE, MAP_PRIVATE, 
fd, 0);

So presumably it's in a library.

akpm:/usr/src/25> ldd /sbin/depmod
linux-gate.so.1 =>  (0xe000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x46afa000)
/lib/ld-linux.so.2 (0x4631d000)

worrisome.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds

On Thu, 21 Dec 2006, Peter Zijlstra wrote:
> 
> Also, I'm dubious about the while thing and stuck a WARN_ON(ret) thing
> at the beginning of the loop. flush_tlb_page() does IPI the other cpus
> to flush their tlb too, so there should not be a SMP race, Arjan?

Now, the reason I think the loop may be needed is:

CPU#0   CPU#1
-   -
load old PTE entry
clear dirty and WP bits
write to page using old PTE
NOT CHECKING that the new one
is write-protected, and just
setting the dirty bit blindly
(but atomically)
flush_tlb_page()
TLB flushed, but we now have a
page that is marked dirty and
unwritable in the page tables,
and we will mark it clean in
"struct page *"

Now, the scary thing is, IF a CPU does this, then the way we do all this, 
we may actually have the following sequence:

CPU#0   CPU#1
-   -
load old PTE entry
ptep_clear_flush():
atomic "set dirty bit" sequence
PTEP now contains 040 !!!
flush_tlb_page();
TLB flushed, but PTEP is still 
"dirty zero"
write the clear/readonly PTE
THE DIRTY BIT WAS LOST!

which might actually explain this bug.

I personally _thought_ that Intel CPU's don't actually do an "set dirty 
bit atomically" sequence, but more of a "set dirty bit but trap if the TLB 
is nonpresent" thing, but I have absolutely no proof for that.

Anyway, IF this is the case, then the following patch may or may not fix 
things. It avoids things by never overwriting a PTE entry, not even the 
"cleared" one. It always does an atomic "xchg()" with a valid new entry, 
and looks at the old bits.

What do you guys think? Does something like this work out for S/390 too? I 
tried to make that "ptep_flush_dirty()" concept work for architectures 
that hide the dirty bit somewhere else too, but..

It actually simplifies the architecture-specific code (you just need to 
implement a trivial "ptep_exchange()" and "ptep_flush_dirty()" macro), but 
I only did x86-64 and i386, and while I've booted with this, I haven't 
really given the thing a lot of really _deep_ thought.

But I think this might be safer, as per above.. And it _might_ actually 
explain the problem. Exactly because the "ptep_clear() + blindly assign to 
ptep" might lose a dirty bit that was written by another CPU.

But this really does depend on what a CPU does when it marks a page dirty. 
Does it just blindly write the dirty bit? Or does it actually _validate_ 
that the old page table entry was still present and writable?

This patch makes no assumptions. It should work even if a CPU just writes 
the dirty bit blindly, and the only expectation is that the page tables 
can be accessed atomically (which had _better_ be true on any SMP 
architecture)

Arjan, can you please check within Intel, and ask what the "proper" 
sequence for doing something like this is?

Linus

commit 301d2d53ca0e5d2f61b1c1c259da410c7ee6d6a7
Author: Linus Torvalds <[EMAIL PROTECTED]>
Date:   Thu Dec 21 11:11:05 2006 -0800

Rewrite the page table "clear dirty and writable" accesses

This is much simpler for most architectures, and allows us to do the
dirty and writable clear in a single operation without any races or any
double flushes.

It's also much more careful: we never overwrite the old dirty bits at
any time, and always make sure to do atomic memory ops to exchange and
see the old value.

Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 9d774d0..8879f1d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -61,31 +61,6 @@ do { 
  \
 })
 #endif

-#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
-#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\
-({ \
-   pte_t __pte = *__ptep;  \
-   int r = 1;  \
-   if (!pte_dirty(__pte))  \
-   r = 0

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds

On Wed, 20 Dec 2006, Trond Myklebust wrote:
> 
> I can't see that it is the business of invalidate_inode_pages2() to
> resolve races between ->direct_IO() and pages that are redirtied by
> mmap(). All it needs to ensure is that pages that clean are discarded,
> since those are neither consistent with data that the ->directIO() call
> wrote to the disk nor are they scheduled to be written to disk.

Sure, we could happily just remove the -EIO. Alternatively, we could still 
do all the invalidates over the whole range, and return -EIO at the end of 
any of the pages weren't invalidated because they had to be written back. 

I don't personally care whether we should just return success or something 
to indicate that there were busy pages, but somebody who _uses_ direct-IO 
might want to know that the thing didn't throw away everything. If you 
know such users, can you ask them?

(Maybe "-EAGAIN" is better than "-EIO", since it's not really even a fatal 
error).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds

On Thu, 21 Dec 2006, Andrei Popa wrote:

> On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote:
> > 
> > Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be 
> > talking about different bugs, so _both_ of your experiences definitely 
> > matter here).
> 
> with http://lkml.org/lkml/diff/2006/12/20/204/1
> I have corruption: Hash check on download completion found bad chunks,
> consider using "safe_sync".

Gaah. Martin Michlmayr reported that it apparently fixes his ARM 
corruption.

Now, admittedly I already suspected the issues might be different (if only 
because of the UP vs SMP/PREEMPT case), but I really had my hopes up after 
Martin's report, because if anything, _his_ issue might have been a 
superset of your problem (while obviously any subtle SMP races you might 
be seeing are definitely not an issue in his case).

Oh well. I think the ARM case is enough of a reason to apply those patches 
(if it hadn't made any difference at all, I'd have waited until after 
2.6.20), and we'll just have to continue on the SMP PREEMPT angle.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Andrei Popa

On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote:
> 
> Btw, I'd really love to hear whether the patch I sent out actually _helps_ 
> at all, or whether we're just discussing something that in the end is just 
> a cleanup..
> 
> Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be 
> talking about different bugs, so _both_ of your experiences definitely 
> matter here).

with http://lkml.org/lkml/diff/2006/12/20/204/1
I have corruption: Hash check on download completion found bad chunks,
consider using "safe_sync".

> 
>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Johansson


> On Wed, 20 Dec 2006, Linus Torvalds wrote:
> Martin, Andrei, does this make any difference for your corruption cases?

Hi!
I've been watching this issue since I'm experiencing rtorrent corruption 
since 2.6.19.


Details: i386, UP, no preempt:
kungen:/proc# zgrep PREEMPT config.gz
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
kungen:/proc# uname -a
Linux kungen.fatbob.nu 2.6.19.1 #3 Thu Dec 21 13:18:06 CET 2006 i686 
GNU/Linux


Corruption is still present with the patch below (patched against 
2.6.19.1 and removed task_io_account_cancelled_write call)


/Martin

[Not subscribed to the list]

> ---
> diff --git a/fs/buffer.c b/fs/buffer.c
> index d1f1b54..263f88e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page)
>  int ret = 0;
>  
>  BUG_ON(!PageLocked(page));

> -if (PageWriteback(page))
> +if (PageDirty(page) || PageWriteback(page))
>  return 0;
>  
>  if (mapping == NULL) {/* can this still happen? */

> @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page)
>  spin_lock(&mapping->private_lock);
>  ret = drop_buffers(page, &buffers_to_free);
>  spin_unlock(&mapping->private_lock);
> -if (ret) {
> -/*
> - * If the filesystem writes its buffers by hand (eg ext3)
> - * then we can have clean buffers against a dirty page.  We
> - * clean the page here; otherwise later reattachment of buffers
> - * could encounter a non-uptodate page, which is unresolvable.
> - * This only applies in the rare case where try_to_free_buffers
> - * succeeds but the page is not freed.
> - *
> - * Also, during truncate, discard_buffer will have marked all
> - * the page's buffers clean.  We discover that here and clean
> - * the page also.
> - */
> -if (test_clear_page_dirty(page))
> -task_io_account_cancelled_write(PAGE_CACHE_SIZE);
> -}
>  out:
>  if (buffers_to_free) {
>  struct buffer_head *bh = buffers_to_free;
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index ed2c223..4f4cd13 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct file *file,
>  
>  static void truncate_huge_page(struct page *page)

>  {
> -clear_page_dirty(page);
> +cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
>  ClearPageUptodate(page);
>  remove_from_page_cache(page);
>  put_page(page);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 4830a3b..350878a 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -253,15 +253,11 @@ static inline void SetPageUptodate(struct page 
*page)
>  
>  struct page;/* forward declaration */
>  
> -int test_clear_page_dirty(struct page *page);
> +extern void cancel_dirty_page(struct page *page, unsigned int 
account_size);

> +
>  int test_clear_page_writeback(struct page *page);
>  int test_set_page_writeback(struct page *page);
>  
> -static inline void clear_page_dirty(struct page *page)

> -{
> -test_clear_page_dirty(page);
> -}
> -
>  static inline void set_page_writeback(struct page *page)
>  {
>  test_set_page_writeback(page);
> diff --git a/mm/memory.c b/mm/memory.c
> index c00bac6..79cecab 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space 
*mapping,

>  }
>  EXPORT_SYMBOL(unmap_mapping_range);
>  
> +static void check_last_page(struct address_space *mapping, loff_t size)

> +{
> +pgoff_t index;
> +unsigned int offset;
> +struct page *page;
> +
> +if (!mapping)
> +return;
> +offset = size & ~PAGE_MASK;
> +if (!offset)
> +return;
> +index = size >> PAGE_SHIFT;
> +page = find_lock_page(mapping, index);
> +if (page) {
> +unsigned int check = 0;
> +unsigned char *kaddr = kmap_atomic(page, KM_USER0);
> +do {
> +check += kaddr[offset++];
> +} while (offset < PAGE_SIZE);
> +kunmap_atomic(kaddr,KM_USER0);
> +unlock_page(page);
> +page_cache_release(page);
> +if (check)
> +printk("%s: BADNESS: truncate check %u\n", 
current->comm, check);

> +}
> +}
> +
>  /**
>   * vmtruncate - unmap mappings "freed" by truncate() syscall
>   * @inode: inode of the file used
> @@ -1875,6 +1902,7 @@ do_expand:
>  goto out_sig;
>  if (offset > inode->i_sb->s_maxbytes)
>  goto out_big;
> +check_last_page(mapping, inode->i_size);
>  i_size_write(inode, offset);
>  
>  out_truncate:

> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 237107c..b3a198c 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -845,38 +845,6 @@ int set_page_dirty_lock(struct page *page)
>  EXPORT_SYMBOL(set_page

Re: 2.6.19 file content corruption on ext3

2006-12-21 Thread Peter Zijlstra

On Tue, 2006-12-19 at 09:43 -0800, Linus Torvalds wrote:
> 
> Btw,
>  here's a totally new tangent on this: it's possible that user code is 
> simply BUGGY. 

depmod: BADNESS: written outside isize 22183

---
diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..5db9fd9 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2393,6 +2393,17 @@ int nobh_commit_write(struct file *file, struct page 
*page,
 }
 EXPORT_SYMBOL(nobh_commit_write);
 
+static void __check_tail_zero(char *kaddr, unsigned int offset)
+{
+   unsigned int check = 0;
+   do {
+   check += kaddr[offset++];
+   } while (offset < PAGE_CACHE_SIZE);
+   if (check)
+   printk(KERN_ERR "%s: BADNESS: written outside isize %u\n",
+   current->comm, check);
+}
+
 /*
  * nobh_writepage() - based on block_full_write_page() except
  * that it tries to operate without attaching bufferheads to
@@ -2437,6 +2448,7 @@ int nobh_writepage(struct page *page, get_block_t 
*get_block,
 * writes to that region are not written out to the file."
 */
kaddr = kmap_atomic(page, KM_USER0);
+   __check_tail_zero(kaddr, offset);
memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
flush_dcache_page(page);
kunmap_atomic(kaddr, KM_USER0);
@@ -2604,6 +2616,7 @@ int block_write_full_page(struct page *page, get_block_t 
*get_block,
 * writes to that region are not written out to the file."
 */
kaddr = kmap_atomic(page, KM_USER0);
+   __check_tail_zero(kaddr, offset);
memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
flush_dcache_page(page);
kunmap_atomic(kaddr, KM_USER0);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Russell King

On Thu, Dec 21, 2006 at 12:30:22PM +, Russell King wrote:
> On Wed, Dec 20, 2006 at 11:53:25PM -0800, Linus Torvalds wrote:
> > That's obviously a bug worth fixing on its own. Do you know when it 
> > started?
> 
> My last merge, just before 2.6.19-rc1.

Obviously 2.6.20-rc1.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Russell King

On Wed, Dec 20, 2006 at 11:53:25PM -0800, Linus Torvalds wrote:
> That's obviously a bug worth fixing on its own. Do you know when it 
> started?

My last merge, just before 2.6.19-rc1.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Michlmayr

* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-20 11:50]:
> Martin, Andrei, does this make any difference for your corruption
> cases?

Works for me.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Russell King

On Thu, Dec 21, 2006 at 09:18:45AM +0100, Martin Michlmayr wrote:
> * Russell King <[EMAIL PROTECTED]> [2006-12-20 22:11]:
> > > This patch doesn't fix my problem (apt segfaults on ARM because its
> > > database is corrupted).
> > 
> > Are you using IDE in PIO mode?  If so, the bug probably lies there.
> 
> I'm using usb-storage.  It's used to access an external IDE drive in
> an USB enclosure but I don't think it matters that it's IDE since
> we're using the SCSI layer to talk to it, right?

USB generally uses DMA so you're probably safe.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Schwidefsky

On Thu, 2006-12-21 at 10:20 +0100, Peter Zijlstra wrote:
> > Now you are flushing the tlb twice. ptep_clear_flush clears the pte and
> > flushes the tlb, ptep_establish sets the new pte and flushes the tlb.
> > Not good. Use set_pte_at instead of the ptep_establish.
> 
> Yeah, sorry, I already noticed and corrected that :-|
> 
> Also, I'm dubious about the while thing and stuck a WARN_ON(ret) thing
> at the beginning of the loop. flush_tlb_page() does IPI the other cpus
> to flush their tlb too, so there should not be a SMP race, Arjan?

The while loop is protected by the pte lock and flush_tlb_page has to
remove the tlbs on all cpus. So yes, I think the while loop is not
necessary.

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Andrew Morton

On Thu, 21 Dec 2006 02:17:05 -0700
"Gordon Farquharson" <[EMAIL PROTECTED]> wrote:

> Can the call to task_io_account_cancelled_write() simply be removed
> from cancel_dirty_page() for testing the patch with 2.6.19 (since
> 2.6.19 doesn't seem to have the task I/O accounting) ?

Yes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Peter Zijlstra

On Thu, 2006-12-21 at 10:16 +0100, Martin Schwidefsky wrote:
> On Thu, 2006-12-21 at 00:03 +0100, Peter Zijlstra wrote:
> > current version
> 
> Nitpicking ..
> 
> > @@ -444,17 +444,18 @@ static int page_mkclean_one(struct page
> > if (!pte)
> > goto out;
> > 
> > -   if (!pte_dirty(*pte) && !pte_write(*pte))
> > -   goto unlock;
> > +   while (pte_dirty(*pte) || pte_write(*pte)) {
> > +   pte_t entry;
> > 
> > -   entry = ptep_get_and_clear(mm, address, pte);
> > -   entry = pte_mkclean(entry);
> > -   entry = pte_wrprotect(entry);
> > -   ptep_establish(vma, address, pte, entry);
> > -   lazy_mmu_prot_update(entry);
> > -   ret = 1;
> > +   flush_cache_page(vma, address, pte_pfn(*pte));
> > +   entry = ptep_clear_flush(vma, address, pte);
> > +   entry = pte_wrprotect(entry);
> > +   entry = pte_mkclean(entry);
> > +   ptep_establish(vma, address, pte, entry);
> 
> Now you are flushing the tlb twice. ptep_clear_flush clears the pte and
> flushes the tlb, ptep_establish sets the new pte and flushes the tlb.
> Not good. Use set_pte_at instead of the ptep_establish.

Yeah, sorry, I already noticed and corrected that :-|

Also, I'm dubious about the while thing and stuck a WARN_ON(ret) thing
at the beginning of the loop. flush_tlb_page() does IPI the other cpus
to flush their tlb too, so there should not be a SMP race, Arjan?

> > +   lazy_mmu_prot_update(entry);
> > +   ret = 1;
> > +   }
> > 
> > -unlock:
> > pte_unmap_unlock(pte, ptl);
> >  out:
> > return ret;
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Gordon Farquharson


On 12/21/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


That said, I think the patch I sent out should actually work on top of
plain 2.6.19 too. I don't think things have changed in this area that
much. IOW, you don't _need_ latest -git to test it, you just need a broken
kernel ;)


I created a version of your patch that applied to 2.6.19, but it
doesn't compile:

mm/built-in.o: In function `cancel_dirty_page':
slab.c:(.text+0x8964): undefined reference to `task_io_account_cancelled_write'
make[3]: *** [.tmp_vmlinux1] Error 1

It looks like task_io_account_cancelled_write() was added in

http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7c3ab7381e79dfc7db14a67c6f4f3285664e1ec2

Can the call to task_io_account_cancelled_write() simply be removed
from cancel_dirty_page() for testing the patch with 2.6.19 (since
2.6.19 doesn't seem to have the task I/O accounting) ?

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Schwidefsky

On Thu, 2006-12-21 at 00:03 +0100, Peter Zijlstra wrote:
> current version

Nitpicking ..

> @@ -444,17 +444,18 @@ static int page_mkclean_one(struct page
>   if (!pte)
>   goto out;
> 
> - if (!pte_dirty(*pte) && !pte_write(*pte))
> - goto unlock;
> + while (pte_dirty(*pte) || pte_write(*pte)) {
> + pte_t entry;
> 
> - entry = ptep_get_and_clear(mm, address, pte);
> - entry = pte_mkclean(entry);
> - entry = pte_wrprotect(entry);
> - ptep_establish(vma, address, pte, entry);
> - lazy_mmu_prot_update(entry);
> - ret = 1;
> + flush_cache_page(vma, address, pte_pfn(*pte));
> + entry = ptep_clear_flush(vma, address, pte);
> + entry = pte_wrprotect(entry);
> + entry = pte_mkclean(entry);
> + ptep_establish(vma, address, pte, entry);

Now you are flushing the tlb twice. ptep_clear_flush clears the pte and
flushes the tlb, ptep_establish sets the new pte and flushes the tlb.
Not good. Use set_pte_at instead of the ptep_establish.

> + lazy_mmu_prot_update(entry);
> + ret = 1;
> + }
> 
> -unlock:
>   pte_unmap_unlock(pte, ptl);
>  out:
>   return ret;

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Linus Torvalds



On Thu, 21 Dec 2006, Martin Michlmayr wrote:
> 
> This is a known issue.  The following patch has been proposed
> http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=4030/1
> although I just notice that it has been marked as "discarded".
> Apparently Russell King commited a better patch so this should be
> fixed in git when he sends his next pull request.

Ahh, ok. Then it might even be in the set of merges I did earlier today 
(and which should mirror out soon enough, hopefully).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Michlmayr

* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-20 23:53]:
> > Unfortunately, I cannot get the latest git version of the kernel to
> > boot on the ARM machine on which Martin and I are experiencing the apt
> > segfault.
> 
> Ouch.
> 
> That's obviously a bug worth fixing on its own. Do you know when it
> started?

This is a known issue.  The following patch has been proposed
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=4030/1
although I just notice that it has been marked as "discarded".
Apparently Russell King commited a better patch so this should be
fixed in git when he sends his next pull request.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Martin Michlmayr

* Russell King <[EMAIL PROTECTED]> [2006-12-20 22:11]:
> > This patch doesn't fix my problem (apt segfaults on ARM because its
> > database is corrupted).
> 
> Are you using IDE in PIO mode?  If so, the bug probably lies there.

I'm using usb-storage.  It's used to access an external IDE drive in
an USB enclosure but I don't think it matters that it's IDE since
we're using the SCSI layer to talk to it, right?
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Peter Zijlstra

On Wed, 2006-12-20 at 21:36 -0500, Trond Myklebust wrote:
> On Wed, 2006-12-20 at 23:15 +0100, Peter Zijlstra wrote:
> > I think this is also needed:
> 
> NAK
> 
> invalidate_inode_pages2() should _not_ be pretending that dirty pages
> are clean. This patch is incorrect both for the NFS usage and for the
> directIO usage.
> 
> In the latter case, if someone has the page mmapped, resulting in the
> page getting marked as dirty _after_ a directIO write, then it would be
> wrong to discard that data. Only dirty data from _before_ the directIO
> write should needs to be discarded (and that is achieved by unmapping,
> then cleaning the page prior to the directIO call)...
> 
> For the NFS case, the race is a bit more tricky, since you have the
> "unstable write" case which means that the page is neither marked as
> dirty, nor is entirely clean ('cos we don't know that the server has
> committed the data to permanent storage yet).

Then this patch:
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc1/2.6.20-rc1-mm1/broken-out/nfs-fix-nr_file_dirty-underflow.patch

is equally wrong, right?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 263 matches

Mail list logo