Re: [FIXED!] kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-11 Thread Joseph Cheek

this works fine in pre8.  thanks all!

Joseph Cheek wrote:

> copying files off a loopback-mounted vfat filesystem exposes this bug.
> test11 worked fine.
>
> loop.o built as module.  this hard crashes the machine, every time
> [PIII-450].  i don't know how to debug this, is there a FAQ?
>
> [transcribed by hand]:
>
> # mount -o loop /tmp/cdboot.288 /mnt/cd
> # cd /mnt/cd
> # cp menu.lst /tmp
> kernel BUG at buffer.c:827!
> invalid operand: 
> CPU: 0
> EIP: 0010:[]
> EFLAGS: 00010082
> eax: 001c ebx: c1d8fc60 ecx:  edx: 0001
> esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0
> ds: 0018 es: 0018 ss: 0018
> Process cp (pid 762, stackpage=c1793000)
> Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001
> c01610e1
>c1d8fc60 0001 c1cef420  c1cef420 c02c8ed8 c88df91c
> c1cef420
>0001 c88e0986 0007  0001 c02c8ed8 c02c8ee8
> c4f18800
> Call Trace: [] [] [] []
> [] [] [] []
>[] [] [] [] []
> [] [] [c0128720>]
>[] [] [c010b56b>]
> Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00
>
> as soon as i reboot i will look what's at buffer.c:827

thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare.  Support for the Revolution.  | [EMAIL PROTECTED]
CTO / Acting PM, Redmond Linux Project   | [EMAIL PROTECTED]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [FIXED!] kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-11 Thread Joseph Cheek

this works fine in pre8.  thanks all!

Joseph Cheek wrote:

 copying files off a loopback-mounted vfat filesystem exposes this bug.
 test11 worked fine.

 loop.o built as module.  this hard crashes the machine, every time
 [PIII-450].  i don't know how to debug this, is there a FAQ?

 [transcribed by hand]:

 # mount -o loop /tmp/cdboot.288 /mnt/cd
 # cd /mnt/cd
 # cp menu.lst /tmp
 kernel BUG at buffer.c:827!
 invalid operand: 
 CPU: 0
 EIP: 0010:[c013660c]
 EFLAGS: 00010082
 eax: 001c ebx: c1d8fc60 ecx:  edx: 0001
 esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0
 ds: 0018 es: 0018 ss: 0018
 Process cp (pid 762, stackpage=c1793000)
 Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001
 c01610e1
c1d8fc60 0001 c1cef420  c1cef420 c02c8ed8 c88df91c
 c1cef420
0001 c88e0986 0007  0001 c02c8ed8 c02c8ee8
 c4f18800
 Call Trace: [c01fe484] [c01fe95a] [c0130703] [c8895de3]
 [c88df91c] [c8894494] [c0160d2f] [c0160ead]
[c0161011] [c0137a49] [c0130703] [c8895de3] [c8894494]
 [c01284d3] [c012887b] [c0128720]
[c889448d] [c01349a7] [c010b56b]
 Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00

 as soon as i reboot i will look what's at buffer.c:827

thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare.  Support for the Revolution.  | [EMAIL PROTECTED]
CTO / Acting PM, Redmond Linux Project   | [EMAIL PROTECTED]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds



On Sat, 9 Dec 2000, Mikulas Patocka wrote:
> 
> I did a test. I disabled readahead except for reading all 4 buffers in
> map_4sectors.
> 
> I observed 14% slowdown on walking directories with find and 4% slowdown
> on grepping one my working directory (10M, 281 files).
> 
> If you can't make it otherwise you can rip readahead out. If there is a
> possibility to keep it, it would be better.

The absolutely best thing would be to keep the directories in the page
cache. At which point the whole issue becomes pretty moot and we could use
the page-cache readahead code. Which gets the end right, and can handle
stuff that isn't physically contiguous on disk.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Mikulas Patocka

On Sat, 9 Dec 2000, Alexander Viro wrote:

> On Sat, 9 Dec 2000, Andries Brouwer wrote:
> 
> > On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote:
> > 
> > > > @@ -1210,7 +1204,6 @@
> > > [breada()]
> > >   Umm... why do we keep it, in the first place? AFAICS the only
> > > in-tree user is hpfs_map_sector() and it doesn't look like we really
> > > need it there. OTOH, trimming the buffer.c down is definitely nice.
> > > Mikulas?
> > 
> > Throw it out. The number of users has diminished over time.
> > Recently isofs stopped using breada.
> > The hpfs use was broken, I fixed it a bit some time ago, but
> > there is nothing against throwing it out altogether, I think.
> 
> I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry)
> and it looks like we would be better off doing getblk() on affected sectors
> and ll_rw_block() on the whole bunch - we end up calling breada() for
> increasing block numbers with decreasing readahead window anyway.
> 
> So it probably should go - it gives no real win. Mikulas has the final
> word here - he is the HPFS maintainer, so...

I did a test. I disabled readahead except for reading all 4 buffers in
map_4sectors.

I observed 14% slowdown on walking directories with find and 4% slowdown
on grepping one my working directory (10M, 281 files).

If you can't make it otherwise you can rip readahead out. If there is a
possibility to keep it, it would be better.

Mikulas


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Andi Kleen

Linus Torvalds <[EMAIL PROTECTED]> writes:


> > Modulo the comments above - fine with me. However, there is stuff in
> > drivers/md that I don't understand. Ingo, could you comment on the use of
> > ->b_end_io there?
> 
> I already sent him mail about it for the same reason. 

How about sending mail to all the journaling FS groups too? -- the change is surely
to break all the JFS which use usually b_end_io to submit the data after the journal
has been flushed.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds



On Sat, 9 Dec 2000, Alexander Viro wrote:
>   Fine
> > -   atomic_inc(>b_count);
> 
>   Why? It's cleaner the old way - why bother postponing that until we
> lock the thing?

I had a rule: we always do the "lock_buffer()" and "b_count increment"
together with setting "b_end_io = end_buffer_io_async". Why? Because that
way we pair up the actions, and I could easily verify that every single
user of "end_buffer_io_async" will increment the count (that is
decremented in "end_buffer_io_async").

We never used to have any rules in this area, and it was sometimes hard to
match up the actions with each other.

> >  int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size)
> 
> > if (!page->buffers)
> > +   create_page_buffers(rw, page, dev, b, size);
> 
>   create_empty_buffers(page, dev, size);

Agreed.

>   Modulo the comments above - fine with me. However, there is stuff in
> drivers/md that I don't understand. Ingo, could you comment on the use of
> ->b_end_io there?

I already sent him mail about it for the same reason. 

>   Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't
> even check for buffers being mapped, let alone attempt to map them.
>   Fortunately, ext2 doesn't use it these days, but the rest of block
> filesystems...  WTF? fsync() merrily ignores mmap()'ed stuff?

fsync() has _always_ ignored mmap'ed stuff. 

If you want to get your mappings synchronized, you absolutely positively
have to call "msync()".

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread David Woodhouse

On Fri, 8 Dec 2000, Alexander Viro wrote:

> BTW, what do you think about the following:
>   * add_to_page_cache() is not exported and never used. Kill?

I have my eye on that for execute-in-place of romfs from real ROM chips -
making up struct pages for parts of the ROM chips and inserting them into
the page cache. I'd rather you didn't remove it :)

-- 
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread David S. Miller

   Date:Sat, 9 Dec 2000 00:45:51 -0800 (PST)
   From: Linus Torvalds <[EMAIL PROTECTED]>

out:
   -if (nr) {
   -ll_rw_block(WRITE, nr, arr);
   -} else {
   -UnlockPage(page);
   -}
   +UnlockPage(page);
   ClearPageUptodate(page);
   return err;
}
   @@ -1669,7 +1665,6 @@

It would seem that you would want to unlock the page _after_ clearing
the uptodate bit to make sure people sleeping on the page do not see
it set by accident.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Alexander Viro



On Sat, 9 Dec 2000, Andries Brouwer wrote:

> On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote:
> 
> > > @@ -1210,7 +1204,6 @@
> > [breada()]
> > Umm... why do we keep it, in the first place? AFAICS the only
> > in-tree user is hpfs_map_sector() and it doesn't look like we really
> > need it there. OTOH, trimming the buffer.c down is definitely nice.
> > Mikulas?
> 
> Throw it out. The number of users has diminished over time.
> Recently isofs stopped using breada.
> The hpfs use was broken, I fixed it a bit some time ago, but
> there is nothing against throwing it out altogether, I think.

I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry)
and it looks like we would be better off doing getblk() on affected sectors
and ll_rw_block() on the whole bunch - we end up calling breada() for
increasing block numbers with decreasing readahead window anyway.

So it probably should go - it gives no real win. Mikulas has the final
word here - he is the HPFS maintainer, so...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Andries Brouwer

On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote:

> > @@ -1210,7 +1204,6 @@
> [breada()]
>   Umm... why do we keep it, in the first place? AFAICS the only
> in-tree user is hpfs_map_sector() and it doesn't look like we really
> need it there. OTOH, trimming the buffer.c down is definitely nice.
> Mikulas?

Throw it out. The number of users has diminished over time.
Recently isofs stopped using breada.
The hpfs use was broken, I fixed it a bit some time ago, but
there is nothing against throwing it out altogether, I think.

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Alexander Viro



On Sat, 9 Dec 2000, Linus Torvalds wrote:

> This is a preliminary patch that I have not compiled and probably breaks,
> but you get the idea. I'm going to sleep, to survive another night with
> three small kids.
> 
> If somebody wants to run with this, please try it out, but realize that
> it's a work-in-progress. And realize that bugs in this area tend to
> corrupt filesystems very quickly indeed. I would strongly advice against
> trying this patch out without really grokking what it does and feeling
> confident that it actually works.
> 
> NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried
> to make all the changes truly straightforward and obvious. I want this bug
> squashed, and I don't want to see it again. But I _still_ think this is a
> dangerous patch until somebody like Al has given it a green light. Caveat
> Emptor.
 
> @@ -1001,7 +995,7 @@
>* and it is clean.
>*/
>   if (bh) {
> - init_buffer(bh, end_buffer_io_sync, NULL);
> + init_buffer(bh, end_buffer_io_bad, NULL);

How about NULL?

> @@ -1210,7 +1204,6 @@
[breada()]
Umm... why do we keep it, in the first place? AFAICS the only
in-tree user is hpfs_map_sector() and it doesn't look like we really
need it there. OTOH, trimming the buffer.c down is definitely nice.
Mikulas?

> @@ -1439,7 +1431,7 @@
[create_page_buffers()]
Linus, compare the result with create_empty_buffers(). Then look
at the only caller and notice that it will merrily loop over these
buffer_heads. IOW, I propose to mark them mapped and set the ->b_blocknr
in brw_page(), replace create_page_buffers() call with create_empty_buffers()
and let create_page_buffers() go to hell. Oh, and let create_empty_buffers()
take device as an argument, not inode as it does now.

> @@ -1600,6 +1590,8 @@
[__block_write_full_page()]
>  
>   bh = head;
>   i = 0;
> +
> + /* Stage 1: make sure we have all the buffers mapped! */
>   do {
>   /*
>* If the buffer isn't up-to-date, we can't be sure

Ahem. Think what will happen if we are sitting over the hole in file,
map some buffers and fail in the middle of the fun. Notice that we
do not submit any IO requests in that case, i.e. we just had created
a random crap in file.

More recovery needed here. I would just break out of the mapping loop,
then proceed as above, but limited the activity to mapped blocks only.
And did if (err) ClearPageUptodate(page) in the end.

> @@ -1669,7 +1665,6 @@
[__block_prepare_write()]

Same problem with recovery - see the patch I've sent recently (handling
get_block() failures).

> @@ -1793,35 +1787,40 @@
[block_read_full_page()]
> - init_buffer(bh, end_buffer_io_async, NULL);
Fine
> - atomic_inc(>b_count);

Why? It's cleaner the old way - why bother postponing that until we
lock the thing?

> + /* Stage two: lock the buffers */
> + for (i = 0; i < nr; i++) {
> + struct buffer_head * bh = arr[i];
> + lock_buffer(bh);
> + bh->b_end_io = end_buffer_io_async;
> + atomic_inc(>b_count);

See above.

> @@ -2263,67 +2261,31 @@
>   */
>  int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size)

>   if (!page->buffers)
> + create_page_buffers(rw, page, dev, b, size);

create_empty_buffers(page, dev, size);

> +
> + head = bh = page->buffers;
> + if (!head)
>   BUG();
>  
> - head = page->buffers;
> - bh = head;
> - nr = 0;
> + /* Stage 1: lock all the buffers */
>   do {
> - block = *(b++);
> + lock_buffer(bh);
> + bh->b_end_io = end_buffer_io_async;

bh->b_blocknr = *(b++);
set_bit(BH_Mapped, >b_state);

Modulo the comments above - fine with me. However, there is stuff in
drivers/md that I don't understand. Ingo, could you comment on the use of
->b_end_io there?

Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't
even check for buffers being mapped, let alone attempt to map them.
Fortunately, ext2 doesn't use it these days, but the rest of block
filesystems...  WTF? fsync() merrily ignores mmap()'ed stuff?
I mean, we can mmap() an area, dirty the bloody thing, call fsync() and
get zero traffic. Hmm... OTOH, it might be correct behaviour...
Anyway, fsync() on block filesystems other than ext2 needs fixing.
Badly. I'll port Stephen's patch to them.

IMO patch is mostly safe (the worst part is error recovery on
block_write_full_page()), except the writeout_one_page() part and possibly
the RAID stuff.
Linus, Ingo, Mikulas - comments?
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]

Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds



On Sat, 9 Dec 2000, Linus Torvalds wrote:
> 
> This is a preliminary patch that I have not compiled and probably breaks,
> but you get the idea. I'm going to sleep, to survive another night with
> three small kids.

Call me stupid [ Chorus: "You're stupid, Linus" ], but I actually compiled
and booted this remotely. And it came up. Which is a pretty good sign that
it doesn't have anything really broken in it.

Still, it would be good to have a few other sharp eyes looking it over.
Look sclean and obviously correct to me, but then _everything_ I write
always looks obviously correct yo me.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds


On Fri, 8 Dec 2000, Alexander Viro wrote:
> On Fri, 8 Dec 2000, Linus Torvalds wrote:
> 
> > Looking more at this issue, I suspect that the easiest pretty solution
> > that everybody can probably agree is reasonable is to either pass down the
> > end-of-io callback to ll_rw_block as you suggested, or, preferably by just
> > forcing the _caller_ to do the buffer locking, and just do the b_end_io
> > stuff inside the buffer lock and get rid of all the races that way
> > instead (and make ll_rw_block() verify that the buffers it is passed are
> > always locked).
> 
> Hmm... I've looked through the ll_rw_block() callers and yes, it seems
> to be doable.

Looking at this, there's a _lot_ of them.

I've taken a test-approach that is fairly simple:

 - get rid of the old "ll_rw_block()", because it was inherently racey wrt
   bh->b_end_io, I think we all agree that changing bh->b_end_io without
   holding any locks at all is fairly dangerous.

 - instead, a simple "submit_bh()" thing, that takes only one locked
   buffer head at a time, and queues it for IO. The usage would basically
   be

lock_buffer(bh);
bh->b_end_io = completion_callback;
submit_bh(READ|WRITE, bh);

   which is a pretty clean and simple interface that has no obvious
   races - submit_bh() will set bh->b_end_io to the completion callback.

 - BUT BUT BUT: Because of tons of old users of ll_rw_block(), we
   introduce a totally new ll_rw_block() that has the same old interface,
   but basically does

void ll_rw_block(int op, int nr, struct buffer_head **array)
{
int i;

for (i = 0; i < nr; i++) {
struct buffer_head * bh = array[i];
lock_buffer(bh);
bh->b_end_io = end_buffer_io_sync;
submit_bh(op, bh);
}
}

   Again, the above avoids the race (we never touch b_end_io except with
   the buffer lock held), and allows all old uses of "ll_rw_block()"
   _except_ the ones that wanted to do the fancy async callbacks.

The advantage? All the regular old code that isn't fancy (ie the low-level
filesystems, bread(), breada() etc) will get a working ll_rw_block() with
the semantics they want, and we can pretty much prove that they can never
get an async handler even by mistake.

And the (few) clever routines in fs/buffer.c that currently use
ll_rw_block() with async handlers can just be converted to use the
submit_bh() interface.

This is a preliminary patch that I have not compiled and probably breaks,
but you get the idea. I'm going to sleep, to survive another night with
three small kids.

If somebody wants to run with this, please try it out, but realize that
it's a work-in-progress. And realize that bugs in this area tend to
corrupt filesystems very quickly indeed. I would strongly advice against
trying this patch out without really grokking what it does and feeling
confident that it actually works.

NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried
to make all the changes truly straightforward and obvious. I want this bug
squashed, and I don't want to see it again. But I _still_ think this is a
dangerous patch until somebody like Al has given it a green light. Caveat
Emptor.

Linus

-
diff -u --recursive t12p7/linux/drivers/block/ll_rw_blk.c 
linux/drivers/block/ll_rw_blk.c
--- t12p7/linux/drivers/block/ll_rw_blk.c   Thu Dec  7 15:56:25 2000
+++ linux/drivers/block/ll_rw_blk.c Sat Dec  9 00:40:35 2000
@@ -885,6 +885,36 @@
while (q->make_request_fn(q, rw, bh));
 }
 
+
+/*
+ * Submit a buffer head for IO.
+ */
+void submit_bh(int rw, struct buffer_head * bh)
+{
+   if (!test_bit(BH_Lock, >b_state))
+   BUG();
+
+   set_bit(BH_Req, >b_state);
+
+   /*
+* First step, 'identity mapping' - RAID or LVM might
+* further remap this.
+*/
+   bh->b_rdev = bh->b_dev;
+   bh->b_rsector = bh->b_blocknr * (bh->b_size>>9);
+
+   generic_make_request(rw, bh);
+}
+
+/*
+ * Default IO end handler, used by "ll_rw_block()".
+ */
+static void end_buffer_io_sync(struct buffer_head *bh, int uptodate)
+{
+   mark_buffer_uptodate(bh, uptodate);
+   unlock_buffer(bh);
+}
+
 /* This function can be used to request a number of buffers from a block
device. Currently the only restriction is that all buffers must belong to
the same device */
@@ -931,7 +961,8 @@
if (test_and_set_bit(BH_Lock, >b_state))
continue;
 
-   set_bit(BH_Req, >b_state);
+   /* We have the buffer lock */
+   bh->b_end_io = end_buffer_io_sync;
 
switch(rw) {
case WRITE:
@@ -954,17 +985,9 @@
end_io:
bh->b_end_io(bh, test_bit(BH_Uptodate, >b_state));
continue;
-   
}

Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds


On Fri, 8 Dec 2000, Alexander Viro wrote:
 On Fri, 8 Dec 2000, Linus Torvalds wrote:
 
  Looking more at this issue, I suspect that the easiest pretty solution
  that everybody can probably agree is reasonable is to either pass down the
  end-of-io callback to ll_rw_block as you suggested, or, preferably by just
  forcing the _caller_ to do the buffer locking, and just do the b_end_io
  stuff inside the buffer lock and get rid of all the races that way
  instead (and make ll_rw_block() verify that the buffers it is passed are
  always locked).
 
 Hmm... I've looked through the ll_rw_block() callers and yes, it seems
 to be doable.

Looking at this, there's a _lot_ of them.

I've taken a test-approach that is fairly simple:

 - get rid of the old "ll_rw_block()", because it was inherently racey wrt
   bh-b_end_io, I think we all agree that changing bh-b_end_io without
   holding any locks at all is fairly dangerous.

 - instead, a simple "submit_bh()" thing, that takes only one locked
   buffer head at a time, and queues it for IO. The usage would basically
   be

lock_buffer(bh);
bh-b_end_io = completion_callback;
submit_bh(READ|WRITE, bh);

   which is a pretty clean and simple interface that has no obvious
   races - submit_bh() will set bh-b_end_io to the completion callback.

 - BUT BUT BUT: Because of tons of old users of ll_rw_block(), we
   introduce a totally new ll_rw_block() that has the same old interface,
   but basically does

void ll_rw_block(int op, int nr, struct buffer_head **array)
{
int i;

for (i = 0; i  nr; i++) {
struct buffer_head * bh = array[i];
lock_buffer(bh);
bh-b_end_io = end_buffer_io_sync;
submit_bh(op, bh);
}
}

   Again, the above avoids the race (we never touch b_end_io except with
   the buffer lock held), and allows all old uses of "ll_rw_block()"
   _except_ the ones that wanted to do the fancy async callbacks.

The advantage? All the regular old code that isn't fancy (ie the low-level
filesystems, bread(), breada() etc) will get a working ll_rw_block() with
the semantics they want, and we can pretty much prove that they can never
get an async handler even by mistake.

And the (few) clever routines in fs/buffer.c that currently use
ll_rw_block() with async handlers can just be converted to use the
submit_bh() interface.

This is a preliminary patch that I have not compiled and probably breaks,
but you get the idea. I'm going to sleep, to survive another night with
three small kids.

If somebody wants to run with this, please try it out, but realize that
it's a work-in-progress. And realize that bugs in this area tend to
corrupt filesystems very quickly indeed. I would strongly advice against
trying this patch out without really grokking what it does and feeling
confident that it actually works.

NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried
to make all the changes truly straightforward and obvious. I want this bug
squashed, and I don't want to see it again. But I _still_ think this is a
dangerous patch until somebody like Al has given it a green light. Caveat
Emptor.

Linus

-
diff -u --recursive t12p7/linux/drivers/block/ll_rw_blk.c 
linux/drivers/block/ll_rw_blk.c
--- t12p7/linux/drivers/block/ll_rw_blk.c   Thu Dec  7 15:56:25 2000
+++ linux/drivers/block/ll_rw_blk.c Sat Dec  9 00:40:35 2000
@@ -885,6 +885,36 @@
while (q-make_request_fn(q, rw, bh));
 }
 
+
+/*
+ * Submit a buffer head for IO.
+ */
+void submit_bh(int rw, struct buffer_head * bh)
+{
+   if (!test_bit(BH_Lock, bh-b_state))
+   BUG();
+
+   set_bit(BH_Req, bh-b_state);
+
+   /*
+* First step, 'identity mapping' - RAID or LVM might
+* further remap this.
+*/
+   bh-b_rdev = bh-b_dev;
+   bh-b_rsector = bh-b_blocknr * (bh-b_size9);
+
+   generic_make_request(rw, bh);
+}
+
+/*
+ * Default IO end handler, used by "ll_rw_block()".
+ */
+static void end_buffer_io_sync(struct buffer_head *bh, int uptodate)
+{
+   mark_buffer_uptodate(bh, uptodate);
+   unlock_buffer(bh);
+}
+
 /* This function can be used to request a number of buffers from a block
device. Currently the only restriction is that all buffers must belong to
the same device */
@@ -931,7 +961,8 @@
if (test_and_set_bit(BH_Lock, bh-b_state))
continue;
 
-   set_bit(BH_Req, bh-b_state);
+   /* We have the buffer lock */
+   bh-b_end_io = end_buffer_io_sync;
 
switch(rw) {
case WRITE:
@@ -954,17 +985,9 @@
end_io:
bh-b_end_io(bh, test_bit(BH_Uptodate, bh-b_state));
continue;
-   
}
 
-   /*
-   

Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds



On Sat, 9 Dec 2000, Linus Torvalds wrote:
 
 This is a preliminary patch that I have not compiled and probably breaks,
 but you get the idea. I'm going to sleep, to survive another night with
 three small kids.

Call me stupid [ Chorus: "You're stupid, Linus" ], but I actually compiled
and booted this remotely. And it came up. Which is a pretty good sign that
it doesn't have anything really broken in it.

Still, it would be good to have a few other sharp eyes looking it over.
Look sclean and obviously correct to me, but then _everything_ I write
always looks obviously correct yo me.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Alexander Viro



On Sat, 9 Dec 2000, Linus Torvalds wrote:

 This is a preliminary patch that I have not compiled and probably breaks,
 but you get the idea. I'm going to sleep, to survive another night with
 three small kids.
 
 If somebody wants to run with this, please try it out, but realize that
 it's a work-in-progress. And realize that bugs in this area tend to
 corrupt filesystems very quickly indeed. I would strongly advice against
 trying this patch out without really grokking what it does and feeling
 confident that it actually works.
 
 NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried
 to make all the changes truly straightforward and obvious. I want this bug
 squashed, and I don't want to see it again. But I _still_ think this is a
 dangerous patch until somebody like Al has given it a green light. Caveat
 Emptor.
 
 @@ -1001,7 +995,7 @@
* and it is clean.
*/
   if (bh) {
 - init_buffer(bh, end_buffer_io_sync, NULL);
 + init_buffer(bh, end_buffer_io_bad, NULL);

How about NULL?

 @@ -1210,7 +1204,6 @@
[breada()]
Umm... why do we keep it, in the first place? AFAICS the only
in-tree user is hpfs_map_sector() and it doesn't look like we really
need it there. OTOH, trimming the buffer.c down is definitely nice.
Mikulas?

 @@ -1439,7 +1431,7 @@
[create_page_buffers()]
Linus, compare the result with create_empty_buffers(). Then look
at the only caller and notice that it will merrily loop over these
buffer_heads. IOW, I propose to mark them mapped and set the -b_blocknr
in brw_page(), replace create_page_buffers() call with create_empty_buffers()
and let create_page_buffers() go to hell. Oh, and let create_empty_buffers()
take device as an argument, not inode as it does now.

 @@ -1600,6 +1590,8 @@
[__block_write_full_page()]
  
   bh = head;
   i = 0;
 +
 + /* Stage 1: make sure we have all the buffers mapped! */
   do {
   /*
* If the buffer isn't up-to-date, we can't be sure

Ahem. Think what will happen if we are sitting over the hole in file,
map some buffers and fail in the middle of the fun. Notice that we
do not submit any IO requests in that case, i.e. we just had created
a random crap in file.

More recovery needed here. I would just break out of the mapping loop,
then proceed as above, but limited the activity to mapped blocks only.
And did if (err) ClearPageUptodate(page) in the end.

 @@ -1669,7 +1665,6 @@
[__block_prepare_write()]

Same problem with recovery - see the patch I've sent recently (handling
get_block() failures).

 @@ -1793,35 +1787,40 @@
[block_read_full_page()]
 - init_buffer(bh, end_buffer_io_async, NULL);
Fine
 - atomic_inc(bh-b_count);

Why? It's cleaner the old way - why bother postponing that until we
lock the thing?

 + /* Stage two: lock the buffers */
 + for (i = 0; i  nr; i++) {
 + struct buffer_head * bh = arr[i];
 + lock_buffer(bh);
 + bh-b_end_io = end_buffer_io_async;
 + atomic_inc(bh-b_count);

See above.

 @@ -2263,67 +2261,31 @@
   */
  int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size)

   if (!page-buffers)
 + create_page_buffers(rw, page, dev, b, size);

create_empty_buffers(page, dev, size);

 +
 + head = bh = page-buffers;
 + if (!head)
   BUG();
  
 - head = page-buffers;
 - bh = head;
 - nr = 0;
 + /* Stage 1: lock all the buffers */
   do {
 - block = *(b++);
 + lock_buffer(bh);
 + bh-b_end_io = end_buffer_io_async;

bh-b_blocknr = *(b++);
set_bit(BH_Mapped, bh-b_state);

Modulo the comments above - fine with me. However, there is stuff in
drivers/md that I don't understand. Ingo, could you comment on the use of
-b_end_io there?

Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't
even check for buffers being mapped, let alone attempt to map them.
Fortunately, ext2 doesn't use it these days, but the rest of block
filesystems... doubletake WTF? fsync() merrily ignores mmap()'ed stuff?
I mean, we can mmap() an area, dirty the bloody thing, call fsync() and
get zero traffic. Hmm... OTOH, it might be correct behaviour...
Anyway, fsync() on block filesystems other than ext2 needs fixing.
Badly. I'll port Stephen's patch to them.

IMO patch is mostly safe (the worst part is error recovery on
block_write_full_page()), except the writeout_one_page() part and possibly
the RAID stuff.
Linus, Ingo, Mikulas - comments?
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Andries Brouwer

On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote:

  @@ -1210,7 +1204,6 @@
 [breada()]
   Umm... why do we keep it, in the first place? AFAICS the only
 in-tree user is hpfs_map_sector() and it doesn't look like we really
 need it there. OTOH, trimming the buffer.c down is definitely nice.
 Mikulas?

Throw it out. The number of users has diminished over time.
Recently isofs stopped using breada.
The hpfs use was broken, I fixed it a bit some time ago, but
there is nothing against throwing it out altogether, I think.

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Alexander Viro



On Sat, 9 Dec 2000, Andries Brouwer wrote:

 On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote:
 
   @@ -1210,7 +1204,6 @@
  [breada()]
  Umm... why do we keep it, in the first place? AFAICS the only
  in-tree user is hpfs_map_sector() and it doesn't look like we really
  need it there. OTOH, trimming the buffer.c down is definitely nice.
  Mikulas?
 
 Throw it out. The number of users has diminished over time.
 Recently isofs stopped using breada.
 The hpfs use was broken, I fixed it a bit some time ago, but
 there is nothing against throwing it out altogether, I think.

I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry)
and it looks like we would be better off doing getblk() on affected sectors
and ll_rw_block() on the whole bunch - we end up calling breada() for
increasing block numbers with decreasing readahead window anyway.

So it probably should go - it gives no real win. Mikulas has the final
word here - he is the HPFS maintainer, so...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread David S. Miller

   Date:Sat, 9 Dec 2000 00:45:51 -0800 (PST)
   From: Linus Torvalds [EMAIL PROTECTED]

out:
   -if (nr) {
   -ll_rw_block(WRITE, nr, arr);
   -} else {
   -UnlockPage(page);
   -}
   +UnlockPage(page);
   ClearPageUptodate(page);
   return err;
}
   @@ -1669,7 +1665,6 @@

It would seem that you would want to unlock the page _after_ clearing
the uptodate bit to make sure people sleeping on the page do not see
it set by accident.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread David Woodhouse

On Fri, 8 Dec 2000, Alexander Viro wrote:

 BTW, what do you think about the following:
   * add_to_page_cache() is not exported and never used. Kill?

I have my eye on that for execute-in-place of romfs from real ROM chips -
making up struct pages for parts of the ROM chips and inserting them into
the page cache. I'd rather you didn't remove it :)

-- 
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds



On Sat, 9 Dec 2000, Alexander Viro wrote:
   Fine
  -   atomic_inc(bh-b_count);
 
   Why? It's cleaner the old way - why bother postponing that until we
 lock the thing?

I had a rule: we always do the "lock_buffer()" and "b_count increment"
together with setting "b_end_io = end_buffer_io_async". Why? Because that
way we pair up the actions, and I could easily verify that every single
user of "end_buffer_io_async" will increment the count (that is
decremented in "end_buffer_io_async").

We never used to have any rules in this area, and it was sometimes hard to
match up the actions with each other.

   int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size)
 
  if (!page-buffers)
  +   create_page_buffers(rw, page, dev, b, size);
 
   create_empty_buffers(page, dev, size);

Agreed.

   Modulo the comments above - fine with me. However, there is stuff in
 drivers/md that I don't understand. Ingo, could you comment on the use of
 -b_end_io there?

I already sent him mail about it for the same reason. 

   Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't
 even check for buffers being mapped, let alone attempt to map them.
   Fortunately, ext2 doesn't use it these days, but the rest of block
 filesystems... doubletake WTF? fsync() merrily ignores mmap()'ed stuff?

fsync() has _always_ ignored mmap'ed stuff. 

If you want to get your mappings synchronized, you absolutely positively
have to call "msync()".

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Andi Kleen

Linus Torvalds [EMAIL PROTECTED] writes:


  Modulo the comments above - fine with me. However, there is stuff in
  drivers/md that I don't understand. Ingo, could you comment on the use of
  -b_end_io there?
 
 I already sent him mail about it for the same reason. 

How about sending mail to all the journaling FS groups too? -- the change is surely
to break all the JFS which use usually b_end_io to submit the data after the journal
has been flushed.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Mikulas Patocka

On Sat, 9 Dec 2000, Alexander Viro wrote:

 On Sat, 9 Dec 2000, Andries Brouwer wrote:
 
  On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote:
  
@@ -1210,7 +1204,6 @@
   [breada()]
 Umm... why do we keep it, in the first place? AFAICS the only
   in-tree user is hpfs_map_sector() and it doesn't look like we really
   need it there. OTOH, trimming the buffer.c down is definitely nice.
   Mikulas?
  
  Throw it out. The number of users has diminished over time.
  Recently isofs stopped using breada.
  The hpfs use was broken, I fixed it a bit some time ago, but
  there is nothing against throwing it out altogether, I think.
 
 I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry)
 and it looks like we would be better off doing getblk() on affected sectors
 and ll_rw_block() on the whole bunch - we end up calling breada() for
 increasing block numbers with decreasing readahead window anyway.
 
 So it probably should go - it gives no real win. Mikulas has the final
 word here - he is the HPFS maintainer, so...

I did a test. I disabled readahead except for reading all 4 buffers in
map_4sectors.

I observed 14% slowdown on walking directories with find and 4% slowdown
on grepping one my working directory (10M, 281 files).

If you can't make it otherwise you can rip readahead out. If there is a
possibility to keep it, it would be better.

Mikulas


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-09 Thread Linus Torvalds



On Sat, 9 Dec 2000, Mikulas Patocka wrote:
 
 I did a test. I disabled readahead except for reading all 4 buffers in
 map_4sectors.
 
 I observed 14% slowdown on walking directories with find and 4% slowdown
 on grepping one my working directory (10M, 281 files).
 
 If you can't make it otherwise you can rip readahead out. If there is a
 possibility to keep it, it would be better.

The absolutely best thing would be to keep the directories in the page
cache. At which point the whole issue becomes pretty moot and we could use
the page-cache readahead code. Which gets the end right, and can handle
stuff that isn't physically contiguous on disk.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro



On Fri, 8 Dec 2000, Linus Torvalds wrote:

> Looking more at this issue, I suspect that the easiest pretty solution
> that everybody can probably agree is reasonable is to either pass down the
> end-of-io callback to ll_rw_block as you suggested, or, preferably by just
> forcing the _caller_ to do the buffer locking, and just do the b_end_io
> stuff inside the buffer lock and get rid of all the races that way
> instead (and make ll_rw_block() verify that the buffers it is passed are
> always locked).

Hmm... I've looked through the ll_rw_block() callers and yes, it seems
to be doable. BTW, we probably want a helper function that would do
lock+ll_rw_block+wait - it will simplify the life in filesystems.
I'll do a patch tonight.
Cheers,
Al
PS: alpha-testers needed for sysvfs patches - right now the thing is racey
as hell and unlike minixfs I can't test it myself. It's more or less
straightforward port of minixfs patch that went into the tree sometime
ago, so it shouldn't be too bad, but... IOW, if somebody has boxen to
test the thing on - yell.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Alexander Viro wrote:
> 
> I'm quite aware of that fact ;-) However, you said 
> 
>On the other hand, I have this suspicion that there is an even simpler
>solution: stop using the end_buffer_io_sync version for writes
>altogether.
> 
> If that happens (i.e. if write requests resulting from prepare_write()/
> commit_write()/bdflush sequence become async) we must stop unlocking pages
> after commit_write(). Essentially it would become unlocker of the same
> kind as readpage() and writepage() - callers must assume that page submitted
> to commit_write() will eventually be unlocked.

You're right, we can't do that for anonymous buffers right now. Mea culpa.

Looking more at this issue, I suspect that the easiest pretty solution
that everybody can probably agree is reasonable is to either pass down the
end-of-io callback to ll_rw_block as you suggested, or, preferably by just
forcing the _caller_ to do the buffer locking, and just do the b_end_io
stuff inside the buffer lock and get rid of all the races that way
instead (and make ll_rw_block() verify that the buffers it is passed are
always locked).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro



On Fri, 8 Dec 2000, Linus Torvalds wrote:

> 
> 
> On Fri, 8 Dec 2000, Alexander Viro wrote:
> > 
> > Erm... So you want to make ->commit_write() page-unlocking? Fine with me,
> > but that will make for somewhat bigger patch. Hey, _you_ are in position
> > to change the locking rules, freeze or not, so if it's OK with you...
> 
> No.
> 
> Read the code a bit more.
> 
> commit_write() doesn't start any IO at all. It only marks the buffer
> dirty.

I'm quite aware of that fact ;-) However, you said 

   On the other hand, I have this suspicion that there is an even simpler
   solution: stop using the end_buffer_io_sync version for writes
   altogether.

If that happens (i.e. if write requests resulting from prepare_write()/
commit_write()/bdflush sequence become async) we must stop unlocking pages
after commit_write(). Essentially it would become unlocker of the same
kind as readpage() and writepage() - callers must assume that page submitted
to commit_write() will eventually be unlocked.

> The dirty flush-out works the way it has always worked.
> 
> (You're right in that the dirty flusher should make sure to change
> b_end_io when they do their write-outs - that code used to just depend on 
> the buffer always having b_end_io magically set)

Hmm... IDGI. Do you want the request resulting from commit_write() ("resulting"
!= "issued") to stay sync (in that case we still need to change
block_write_full_page() since the analysis stands - it dirties blocks
unconditionally) or you want them to go async (in that case we need to change
commit_write() callers)? Could you clarify?

> Btw, I also think that the dirty buffer flushing should get the page lock.
> Right now it touches the buffer list without holding the lock on the page
> that the buffer is on, which means that there is really nothign that
> prevents it from racing with the page-based writeout. I don't like that:
> it does hold the LRU list lock, so the list state itself is ok, but it
> does actually touch part of the buffer state that is not really protected
> by the lock.
> 
> I think it ends up being ok because ll_rw_block will ignore buffers that
> get output twice, and the rest of the state is handled with atomic
> operations (b_count and b_flags), but it's not really proper. If
> flush_dirty_buffers() got the page lock, everything would be protected
> properly. Thoughts?

Umm... I don't think that we need to do it on per-page basis. We need some
exclusion, all right, but I'ld rather provide it from block_write_full_page()
and its ilk. Hell knows...
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Rajagopal Ananthanarayanan

Linus Torvalds wrote:
> 
> On Fri, 8 Dec 2000, Daniel Phillips wrote:
> >
> > [ flush-buffers taking the page lock ]
> >
> > This is great when you have buffersize==pagesize.  When there are
> > multiple buffers per page it means that some of the buffers might have
> > to wait for flushing just because bdflush started IO on some other
> > buffer on the same page.  Oh well.  The common case improves in terms
> > being proveably correct and the uncommon case gets worse a tiny bit.
> > It sounds like a win.
> 
> Also, I think that we should strive for a setup where most of the dirty
> buffer flushing is done through "page_launder()" instead of using
> sync_buffers all that much at all.
> 
> I'm convinced that the page LRU list is as least as good as, if not better
> than, the dirty buffer timestamp stuff. And as we need to have the page
> LRU for other reasons anyway, I'd like the long-range plan to be to get
> rid of the buffer LRU completely. It wastes memory and increases
> complexity for very little gain, I think.
> 

I think flushing pages instead of buffers is a good direction to take.
Two things:

1. currently bdflush is setup to use page_launder only
   under memory pressure (if (free_shortage() ... )
   Do you think that it should call page_launder regardless?

2. There are two operations here:
a. starting a write-back, periodically.
b. freeing a page, which may involve taking the page
out of a inode mapping, etc. IOW, what page_launder does.
   bdflush primarily does (a). If we want to move to page-oriented
   flushing, we atleast need extra information in the _page_ structure
   as to whether it is time to flush the page back.


--
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Daniel Phillips wrote:
>
> [ flush-buffers taking the page lock ]
> 
> This is great when you have buffersize==pagesize.  When there are
> multiple buffers per page it means that some of the buffers might have
> to wait for flushing just because bdflush started IO on some other
> buffer on the same page.  Oh well.  The common case improves in terms
> being proveably correct and the uncommon case gets worse a tiny bit. 
> It sounds like a win.

Also, I think that we should strive for a setup where most of the dirty
buffer flushing is done through "page_launder()" instead of using
sync_buffers all that much at all. 

I'm convinced that the page LRU list is as least as good as, if not better
than, the dirty buffer timestamp stuff. And as we need to have the page
LRU for other reasons anyway, I'd like the long-range plan to be to get
rid of the buffer LRU completely. It wastes memory and increases
complexity for very little gain, I think.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Daniel Phillips

On Fri, 08 Dec 2000, Linus Torvalds wrote:
> Btw, I also think that the dirty buffer flushing should get the page lock.
> Right now it touches the buffer list without holding the lock on the page
> that the buffer is on, which means that there is really nothign that
> prevents it from racing with the page-based writeout. I don't like that:
> it does hold the LRU list lock, so the list state itself is ok, but it
> does actually touch part of the buffer state that is not really protected
> by the lock.
> 
> I think it ends up being ok because ll_rw_block will ignore buffers that
> get output twice, and the rest of the state is handled with atomic
> operations (b_count and b_flags), but it's not really proper. If
> flush_dirty_buffers() got the page lock, everything would be protected
> properly. Thoughts?

This is great when you have buffersize==pagesize.  When there are
multiple buffers per page it means that some of the buffers might have
to wait for flushing just because bdflush started IO on some other
buffer on the same page.  Oh well.  The common case improves in terms
being proveably correct and the uncommon case gets worse a tiny bit. 
It sounds like a win.

We are moving steadily away from buffer oriented I/O anyway, and I
think we can optimize the case of buffersize!=pagesize in 2.5 by being a
little more careful about how getblk hands out buffers.  Getblk
could force all buffers on the same page to be on the same
major/minor, or even better, to be physically close together.  Or
more simply, flush_dirty_buffers could check for other dirty buffers on
the same page and initiate I/O at the same time.  Or both strategies.

This is by way of buttressing an argument in favor of simplicity
and reliabilty, i.e., being religious about taking the page lock, even
when there's a slight cost in the short term.

-- 
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Alexander Viro wrote:
> 
> Erm... So you want to make ->commit_write() page-unlocking? Fine with me,
> but that will make for somewhat bigger patch. Hey, _you_ are in position
> to change the locking rules, freeze or not, so if it's OK with you...

No.

Read the code a bit more.

commit_write() doesn't start any IO at all. It only marks the buffer
dirty.

The dirty flush-out works the way it has always worked.

(You're right in that the dirty flusher should make sure to change
b_end_io when they do their write-outs - that code used to just depend on 
the buffer always having b_end_io magically set)

> Hrm. Why not move setting ->b_end_io to ll_rw_block() while we are at it?
> Or into ll_rw_block() wrapper...

That's too big a change right now, but yes, in theory. That would make
sure that nobody could ever forget to set the "completion" handler.

In the meantime, let's just enforce it for ll_rw_block: make sure that
there is a 1:1 mapping between setting b_end_io and doing a ll_rw_block
(and let's make sure that there are no more of these non-local rules any
more).

Btw, I also think that the dirty buffer flushing should get the page lock.
Right now it touches the buffer list without holding the lock on the page
that the buffer is on, which means that there is really nothign that
prevents it from racing with the page-based writeout. I don't like that:
it does hold the LRU list lock, so the list state itself is ok, but it
does actually touch part of the buffer state that is not really protected
by the lock.

I think it ends up being ok because ll_rw_block will ignore buffers that
get output twice, and the rest of the state is handled with atomic
operations (b_count and b_flags), but it's not really proper. If
flush_dirty_buffers() got the page lock, everything would be protected
properly. Thoughts?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Alexander Viro wrote:
> 
> Fix: postpone changing ->b_end_io until the call of ll_rw_block(); if by
> the time of ll_rw_block() some fragments will still have IO in progress -
> wait on them.
> 
> Comments?

Yes.

On the other hand, I have this suspicion that there is an even simpler
solution: stop using the end_buffer_io_sync version for writes
altogether.

It's really only __block_prepare_write() that can mark the buffers for
sync writes, and even that case is fairly bogus: it really only wants to
mark the non-upto-date buffers that it wants to _read_ for sync IO, it
just "knows" that it is ok to change every buffer it sees. It isn't.

Moving the 

bh->b_end_io = end_buffer_io_sync;

into the read-path should be sufficient (hell, we should move the
"ll_rw_block(READ, 1, )" away, and make it one single read with

ll_rw_block(READ, wait_bh-wait, wait);

at the end of the loop.

If we do it that way, there is no way the two can clash, because a
non-up-to-date buffer head won't ever be touched by the write path, so we
can't get this kind of confusion.

Al? I'd really prefer this alternative instead. Do you see any problems
with it?

(New rule that makes 100% sense: NOBODY sets "bh->b_end_io" on any buffer
that he isn't actually going to do IO on.  And if there is pending IO on
the buffer, it had better only be of the same type as the one you're going
to do).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro

Folks, see if the following patch helps. AFAICS it closes a pretty real
race - we could call block_write_full_page() for a page that has sync
IO in progress and blindly change ->b_end_io callbacks on the bh with
pending requests. With a little bit of bad luck they would complete before
we got to ll_rw_block(), thus leading to extra UnlockPage(). All it takes
is a pageout on a page that got partial write recently - if some fragments
were still unmapped we get to call get_block() on them and it can easily
block, providing a decent window for that race.

Fix: postpone changing ->b_end_io until the call of ll_rw_block(); if by
the time of ll_rw_block() some fragments will still have IO in progress -
wait on them.

Comments?
Cheers,
Al

--- buffer.cFri Dec  8 16:19:53 2000
+++ buffer.c.newFri Dec  8 16:26:44 2000
@@ -1577,6 +1577,26 @@
  * "Dirty" is valid only with the last case (mapped+uptodate).
  */
 
+static void write_array_async(struct page *page, struct buffer_head **p, int n)
+{
+   int i;
+   if (!n) {
+   UnlockPage(page);
+   return;
+   }
+   /*
+* If there are pending requests on these guys - wait before changing
+* ->b_end_io.
+*/
+   for (i=0; ib_state);
+   set_bit(BH_Dirty, [i]->b_state);
+   p[i]->b_end_io = end_buffer_io_async;
+   }
+   ll_rw_block(WRITE, n, p);
+}
+
 /*
  * block_write_full_page() is SMP-safe - currently it's still
  * being called with the kernel lock held, but the code is ready.
@@ -1616,28 +1636,17 @@
if (buffer_new(bh))
unmap_underlying_metadata(bh);
}
-   set_bit(BH_Uptodate, >b_state);
-   set_bit(BH_Dirty, >b_state);
-   bh->b_end_io = end_buffer_io_async;
atomic_inc(>b_count);
arr[nr++] = bh;
bh = bh->b_this_page;
block++;
} while (bh != head);
 
-   if (nr) {
-   ll_rw_block(WRITE, nr, arr);
-   } else {
-   UnlockPage(page);
-   }
+   write_array_async(page, arr, nr);
SetPageUptodate(page);
return 0;
 out:
-   if (nr) {
-   ll_rw_block(WRITE, nr, arr);
-   } else {
-   UnlockPage(page);
-   }
+   write_array_async(page, arr, nr);
ClearPageUptodate(page);
return err;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[found?] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro

I'm doing some massive grepping (basically, audit of page locking),
but nothing relevant so far. There was some catch (aside of documenting
the thing and finding several completely unrelated buglets):
* ramfs_writepage() doesn't UnlockPage(). Deadlock.
* udf_adinicb_writepage() does extra UnlockPage().
I don't see the fsckup in loop.c, though. On the read path it uses
do_generic_read_file() and on the write it's essentially the simplified
variant of generic_file_write(). Hell knows... It looks like we are
getting dirty buffer inheriting end_buffer_io_async from the previous
life.

Oh, damn it. Indeed. Look: 

generic_file_write() or lo_send():
lock_page()
->prepare_write()   sets sync ->b_end_io
->commit_write()puts them on the dirty list
UnlockPage()releases the page lock
... requests are sent to driver

page_launder():
TryLockPage()   succeeds
block_write_full_page()
... goes through the bh'es and sets ->b_end_io to end_buffer_io_async()
at that point the last remaining request completes. It calls
->b_end_io(), aka. end_buffer_io_async(). And does UnlockPage().

In the meanwhile, we call ll_rw_block(). Requests are sent again.
When _they_ complete we get the second UnlockPage()

Now, I might miss some obvious reason why it could never happen. Moreover,
the real problem may be completely different - the race above is not too wide.

However, I'ld really like to hear why the scenario above is impossible. BTW,
the race isn't even that narrow - if ->prepare_write() didn't cover the
whole page we've got a get_block() to call and there's a plenty of time when
shit can hit the fan - it's a blocking operation, after all.

Comments?
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread David Woodhouse


[EMAIL PROTECTED] said:
>  Can you test some more? In particular, I'd love to hear if this
> happens with vfat even without loopback, or with loopback even without
> vfat (make an ext2 filesystem or similar instead). That woul dnarrow
> down the bug further. 

Loopback-mounted iso9660 does it too. This was #3 on my list of test12-pre5 
oospen to investigate this weekend :)

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread David Woodhouse


[EMAIL PROTECTED] said:
  Can you test some more? In particular, I'd love to hear if this
 happens with vfat even without loopback, or with loopback even without
 vfat (make an ext2 filesystem or similar instead). That woul dnarrow
 down the bug further. 

Loopback-mounted iso9660 does it too. This was #3 on my list of test12-pre5 
oospen to investigate this weekend :)

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[found?] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro

I'm doing some massive grepping (basically, audit of page locking),
but nothing relevant so far. There was some catch (aside of documenting
the thing and finding several completely unrelated buglets):
* ramfs_writepage() doesn't UnlockPage(). Deadlock.
* udf_adinicb_writepage() does extra UnlockPage().
I don't see the fsckup in loop.c, though. On the read path it uses
do_generic_read_file() and on the write it's essentially the simplified
variant of generic_file_write(). Hell knows... It looks like we are
getting dirty buffer inheriting end_buffer_io_async from the previous
life.

Oh, damn it. Indeed. Look: 

generic_file_write() or lo_send():
lock_page()
-prepare_write()   sets sync -b_end_io
-commit_write()puts them on the dirty list
UnlockPage()releases the page lock
... requests are sent to driver

page_launder():
TryLockPage()   succeeds
block_write_full_page()
... goes through the bh'es and sets -b_end_io to end_buffer_io_async()
at that point the last remaining request completes. It calls
-b_end_io(), aka. end_buffer_io_async(). And does UnlockPage().

In the meanwhile, we call ll_rw_block(). Requests are sent again.
When _they_ complete we get the second UnlockPage()

Now, I might miss some obvious reason why it could never happen. Moreover,
the real problem may be completely different - the race above is not too wide.

However, I'ld really like to hear why the scenario above is impossible. BTW,
the race isn't even that narrow - if -prepare_write() didn't cover the
whole page we've got a get_block() to call and there's a plenty of time when
shit can hit the fan - it's a blocking operation, after all.

Comments?
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro

Folks, see if the following patch helps. AFAICS it closes a pretty real
race - we could call block_write_full_page() for a page that has sync
IO in progress and blindly change -b_end_io callbacks on the bh with
pending requests. With a little bit of bad luck they would complete before
we got to ll_rw_block(), thus leading to extra UnlockPage(). All it takes
is a pageout on a page that got partial write recently - if some fragments
were still unmapped we get to call get_block() on them and it can easily
block, providing a decent window for that race.

Fix: postpone changing -b_end_io until the call of ll_rw_block(); if by
the time of ll_rw_block() some fragments will still have IO in progress -
wait on them.

Comments?
Cheers,
Al

--- buffer.cFri Dec  8 16:19:53 2000
+++ buffer.c.newFri Dec  8 16:26:44 2000
@@ -1577,6 +1577,26 @@
  * "Dirty" is valid only with the last case (mapped+uptodate).
  */
 
+static void write_array_async(struct page *page, struct buffer_head **p, int n)
+{
+   int i;
+   if (!n) {
+   UnlockPage(page);
+   return;
+   }
+   /*
+* If there are pending requests on these guys - wait before changing
+* -b_end_io.
+*/
+   for (i=0; in; i++) {
+   wait_on_buffer(p[i]);
+   set_bit(BH_Uptodate, p[i]-b_state);
+   set_bit(BH_Dirty, p[i]-b_state);
+   p[i]-b_end_io = end_buffer_io_async;
+   }
+   ll_rw_block(WRITE, n, p);
+}
+
 /*
  * block_write_full_page() is SMP-safe - currently it's still
  * being called with the kernel lock held, but the code is ready.
@@ -1616,28 +1636,17 @@
if (buffer_new(bh))
unmap_underlying_metadata(bh);
}
-   set_bit(BH_Uptodate, bh-b_state);
-   set_bit(BH_Dirty, bh-b_state);
-   bh-b_end_io = end_buffer_io_async;
atomic_inc(bh-b_count);
arr[nr++] = bh;
bh = bh-b_this_page;
block++;
} while (bh != head);
 
-   if (nr) {
-   ll_rw_block(WRITE, nr, arr);
-   } else {
-   UnlockPage(page);
-   }
+   write_array_async(page, arr, nr);
SetPageUptodate(page);
return 0;
 out:
-   if (nr) {
-   ll_rw_block(WRITE, nr, arr);
-   } else {
-   UnlockPage(page);
-   }
+   write_array_async(page, arr, nr);
ClearPageUptodate(page);
return err;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Alexander Viro wrote:
 
 Fix: postpone changing -b_end_io until the call of ll_rw_block(); if by
 the time of ll_rw_block() some fragments will still have IO in progress -
 wait on them.
 
 Comments?

Yes.

On the other hand, I have this suspicion that there is an even simpler
solution: stop using the end_buffer_io_sync version for writes
altogether.

It's really only __block_prepare_write() that can mark the buffers for
sync writes, and even that case is fairly bogus: it really only wants to
mark the non-upto-date buffers that it wants to _read_ for sync IO, it
just "knows" that it is ok to change every buffer it sees. It isn't.

Moving the 

bh-b_end_io = end_buffer_io_sync;

into the read-path should be sufficient (hell, we should move the
"ll_rw_block(READ, 1, bh)" away, and make it one single read with

ll_rw_block(READ, wait_bh-wait, wait);

at the end of the loop.

If we do it that way, there is no way the two can clash, because a
non-up-to-date buffer head won't ever be touched by the write path, so we
can't get this kind of confusion.

Al? I'd really prefer this alternative instead. Do you see any problems
with it?

(New rule that makes 100% sense: NOBODY sets "bh-b_end_io" on any buffer
that he isn't actually going to do IO on.  And if there is pending IO on
the buffer, it had better only be of the same type as the one you're going
to do).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Alexander Viro wrote:
 
 Erm... So you want to make -commit_write() page-unlocking? Fine with me,
 but that will make for somewhat bigger patch. Hey, _you_ are in position
 to change the locking rules, freeze or not, so if it's OK with you...

No.

Read the code a bit more.

commit_write() doesn't start any IO at all. It only marks the buffer
dirty.

The dirty flush-out works the way it has always worked.

(You're right in that the dirty flusher should make sure to change
b_end_io when they do their write-outs - that code used to just depend on 
the buffer always having b_end_io magically set)

 Hrm. Why not move setting -b_end_io to ll_rw_block() while we are at it?
 Or into ll_rw_block() wrapper...

That's too big a change right now, but yes, in theory. That would make
sure that nobody could ever forget to set the "completion" handler.

In the meantime, let's just enforce it for ll_rw_block: make sure that
there is a 1:1 mapping between setting b_end_io and doing a ll_rw_block
(and let's make sure that there are no more of these non-local rules any
more).

Btw, I also think that the dirty buffer flushing should get the page lock.
Right now it touches the buffer list without holding the lock on the page
that the buffer is on, which means that there is really nothign that
prevents it from racing with the page-based writeout. I don't like that:
it does hold the LRU list lock, so the list state itself is ok, but it
does actually touch part of the buffer state that is not really protected
by the lock.

I think it ends up being ok because ll_rw_block will ignore buffers that
get output twice, and the rest of the state is handled with atomic
operations (b_count and b_flags), but it's not really proper. If
flush_dirty_buffers() got the page lock, everything would be protected
properly. Thoughts?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Daniel Phillips

On Fri, 08 Dec 2000, Linus Torvalds wrote:
 Btw, I also think that the dirty buffer flushing should get the page lock.
 Right now it touches the buffer list without holding the lock on the page
 that the buffer is on, which means that there is really nothign that
 prevents it from racing with the page-based writeout. I don't like that:
 it does hold the LRU list lock, so the list state itself is ok, but it
 does actually touch part of the buffer state that is not really protected
 by the lock.
 
 I think it ends up being ok because ll_rw_block will ignore buffers that
 get output twice, and the rest of the state is handled with atomic
 operations (b_count and b_flags), but it's not really proper. If
 flush_dirty_buffers() got the page lock, everything would be protected
 properly. Thoughts?

This is great when you have buffersize==pagesize.  When there are
multiple buffers per page it means that some of the buffers might have
to wait for flushing just because bdflush started IO on some other
buffer on the same page.  Oh well.  The common case improves in terms
being proveably correct and the uncommon case gets worse a tiny bit. 
It sounds like a win.

We are moving steadily away from buffer oriented I/O anyway, and I
think we can optimize the case of buffersize!=pagesize in 2.5 by being a
little more careful about how getblk hands out buffers.  Getblk
could force all buffers on the same page to be on the same
major/minor, or even better, to be physically close together.  Or
more simply, flush_dirty_buffers could check for other dirty buffers on
the same page and initiate I/O at the same time.  Or both strategies.

This is by way of buttressing an argument in favor of simplicity
and reliabilty, i.e., being religious about taking the page lock, even
when there's a slight cost in the short term.

-- 
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Daniel Phillips wrote:

 [ flush-buffers taking the page lock ]
 
 This is great when you have buffersize==pagesize.  When there are
 multiple buffers per page it means that some of the buffers might have
 to wait for flushing just because bdflush started IO on some other
 buffer on the same page.  Oh well.  The common case improves in terms
 being proveably correct and the uncommon case gets worse a tiny bit. 
 It sounds like a win.

Also, I think that we should strive for a setup where most of the dirty
buffer flushing is done through "page_launder()" instead of using
sync_buffers all that much at all. 

I'm convinced that the page LRU list is as least as good as, if not better
than, the dirty buffer timestamp stuff. And as we need to have the page
LRU for other reasons anyway, I'd like the long-range plan to be to get
rid of the buffer LRU completely. It wastes memory and increases
complexity for very little gain, I think.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro



On Fri, 8 Dec 2000, Linus Torvalds wrote:

 
 
 On Fri, 8 Dec 2000, Alexander Viro wrote:
  
  Erm... So you want to make -commit_write() page-unlocking? Fine with me,
  but that will make for somewhat bigger patch. Hey, _you_ are in position
  to change the locking rules, freeze or not, so if it's OK with you...
 
 No.
 
 Read the code a bit more.
 
 commit_write() doesn't start any IO at all. It only marks the buffer
 dirty.

I'm quite aware of that fact ;-) However, you said 

   On the other hand, I have this suspicion that there is an even simpler
   solution: stop using the end_buffer_io_sync version for writes
   altogether.

If that happens (i.e. if write requests resulting from prepare_write()/
commit_write()/bdflush sequence become async) we must stop unlocking pages
after commit_write(). Essentially it would become unlocker of the same
kind as readpage() and writepage() - callers must assume that page submitted
to commit_write() will eventually be unlocked.

 The dirty flush-out works the way it has always worked.
 
 (You're right in that the dirty flusher should make sure to change
 b_end_io when they do their write-outs - that code used to just depend on 
 the buffer always having b_end_io magically set)

Hmm... IDGI. Do you want the request resulting from commit_write() ("resulting"
!= "issued") to stay sync (in that case we still need to change
block_write_full_page() since the analysis stands - it dirties blocks
unconditionally) or you want them to go async (in that case we need to change
commit_write() callers)? Could you clarify?

 Btw, I also think that the dirty buffer flushing should get the page lock.
 Right now it touches the buffer list without holding the lock on the page
 that the buffer is on, which means that there is really nothign that
 prevents it from racing with the page-based writeout. I don't like that:
 it does hold the LRU list lock, so the list state itself is ok, but it
 does actually touch part of the buffer state that is not really protected
 by the lock.
 
 I think it ends up being ok because ll_rw_block will ignore buffers that
 get output twice, and the rest of the state is handled with atomic
 operations (b_count and b_flags), but it's not really proper. If
 flush_dirty_buffers() got the page lock, everything would be protected
 properly. Thoughts?

Umm... I don't think that we need to do it on per-page basis. We need some
exclusion, all right, but I'ld rather provide it from block_write_full_page()
and its ilk. Hell knows...
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Linus Torvalds



On Fri, 8 Dec 2000, Alexander Viro wrote:
 
 I'm quite aware of that fact ;-) However, you said 
 
On the other hand, I have this suspicion that there is an even simpler
solution: stop using the end_buffer_io_sync version for writes
altogether.
 
 If that happens (i.e. if write requests resulting from prepare_write()/
 commit_write()/bdflush sequence become async) we must stop unlocking pages
 after commit_write(). Essentially it would become unlocker of the same
 kind as readpage() and writepage() - callers must assume that page submitted
 to commit_write() will eventually be unlocked.

You're right, we can't do that for anonymous buffers right now. Mea culpa.

Looking more at this issue, I suspect that the easiest pretty solution
that everybody can probably agree is reasonable is to either pass down the
end-of-io callback to ll_rw_block as you suggested, or, preferably by just
forcing the _caller_ to do the buffer locking, and just do the b_end_io
stuff inside the buffer lock and get rid of all the races that way
instead (and make ll_rw_block() verify that the buffers it is passed are
always locked).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-08 Thread Alexander Viro



On Fri, 8 Dec 2000, Linus Torvalds wrote:

 Looking more at this issue, I suspect that the easiest pretty solution
 that everybody can probably agree is reasonable is to either pass down the
 end-of-io callback to ll_rw_block as you suggested, or, preferably by just
 forcing the _caller_ to do the buffer locking, and just do the b_end_io
 stuff inside the buffer lock and get rid of all the races that way
 instead (and make ll_rw_block() verify that the buffers it is passed are
 always locked).

Hmm... I've looked through the ll_rw_block() callers and yes, it seems
to be doable. BTW, we probably want a helper function that would do
lock+ll_rw_block+wait - it will simplify the life in filesystems.
I'll do a patch tonight.
Cheers,
Al
PS: alpha-testers needed for sysvfs patches - right now the thing is racey
as hell and unlike minixfs I can't test it myself. It's more or less
straightforward port of minixfs patch that went into the tree sometime
ago, so it shouldn't be too bad, but... IOW, if somebody has boxen to
test the thing on - yell.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Tom Leete

Linus Torvalds wrote:
> 
> It's not a new bug - it's an old bug that apparently is uncovered by a
> new stricter test.
> 
> Apparently loopback unlocks an already unlocked page - which has always
> been a serious offense, but has never been detected before.
> 
> test12-pre6+ detects it, and thus the BUG().
> 
> Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
> it's impossible to debug, but it's already interesting information to
> see that it seems to be either loopback of vfat.
> 
> Can you test some more? In particular, I'd love to hear if this happens
> with vfat even without loopback, or with loopback even without vfat
> (make an ext2 filesystem or similar instead). That woul dnarrow down the
> bug further.
> 
> Thanks,
> Linus

Hi,

Here is a rather different datapoint. I hope it's different enough to help
nail this.

test12-pre5 + kdb + Serial Console. Sorry, I didn't get the contents of
pointer args.

It's probably worth saying that this kernel was compiled with gcc 2.95.2. I
have the blessed egcs also, will compile pre7 with that and see what
happens.

Cheers,
Tom

from nfs mounted ext2 (2.4.0-test12-pre5):
kernel BUG at buffer.c:827!

Entering kdb (current=0xc2014000, pid 466) Panic: invalid operand
due to panic @ 0xc0130c73
eax = 0x001c ebx = 0xc109ebf8 ecx = 0x edx = 0x0006 
esi = 0xc2739af0 edi = 0xc2739b38 esp = 0xc2015dd4 eip = 0xc0130c73 
ebp = 0xc2015df4 xss = 0x0018 xcs = 0xc2010010 eflags = 0x00010046 
xds = 0x0018 xes = 0x0018 origeax = 0x  = 0xc2015da0
kdb> bt
EBP   EIP Function(args)
0xc2015df4 0xc0130c73 end_buffer_io_async+0xd3 (0xc2739af0, 0x1)
   kernel .text 0xc010 0xc0130ba0 0xc0130cb0
0xc2015e10 0xc0164756 end_that_request_first+0x66 (0xc11c2c20, 0x1,
0xc031cf04)
   kernel .text 0xc010 0xc01646f0 0xc01647b0
0xc2015e30 0xc018a128 ide_end_request+0x28 (0x1, 0xc11c5060)
   kernel .text 0xc010 0xc018a100 0xc018a180
0xc2015e64 0xc018e614 read_intr+0x104 (0xc031ce20)
   kernel .text 0xc010 0xc018e510 0xc018e650
0xc2015e88 0xc018bad6 ide_intr+0x106 (0xe, 0xc11c5060, 0xc2015ed4, 0x1c0)
   kernel .text 0xc010 0xc018b9d0 0xc018bb30
0xc2015ea8 0xc010ab50 handle_IRQ_event+0x30 (0xe, 0xc2015ed4, 0xc11de2e0)
   kernel .text 0xc010 0xc010ab20 0xc010ab80
0xc2015ecc 0xc010ace2 do_IRQ+0x72 (0xc02f4520, 0xc02a92ac, 0xc201c000,
0xc201c000, 0xfc18)
   kernel .text 0xc010 0xc010ac70 0xc010ad30
   0xc01093f0 ret_from_intr
   kernel .text 0xc010 0xc01093f0 0xc0109410
Interrupt registers:
eax = 0x0019 ebx = 0xc02f4520 ecx = 0xc02a92ac edx = 0xc201c000 
esi = 0xc201c000 edi = 0xfc18 esp = 0xc2015f08 eip = 0xc0115816 
ebp = 0xc2015f4c xss = 0x0018 xcs = 0xc010 eflags = 0x0287 
xds = 0xc2070018 xes = 0xc2070018 origeax = 0xff0e  = 0xc2015ed4 
   0xc0115816 schedule+0x1b6
   kernel .text 0xc010 0xc0115660 0xc0115b00
0xc2015f70 0xc01155c7 schedule_timeout+0x17 (0xc2014000, 0x1785222)
   kernel .text 0xc010 0xc01155b0 0xc0115650
0xc2015fac 0xc4055753 [sunrpc]svc_recv+0x1a3 (0xc24b2470, 0xc207be00,
0x7fff)
   sunrpc .text 0xc404e060 0xc40555b0 0xc4055940
0xc2015fec 0xc40713f3 [nfsd]nfsd+0x253
   nfsd .text 0xc4071060 0xc40711a0 0xc40714a0
   0xc0107843 kernel_thread+0x23
   kernel .text 0xc010 0xc0107820 0xc0107860
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Keith Owens

On Thu, 07 Dec 2000 17:23:51 -0800, 
Joseph Cheek <[EMAIL PROTECTED]> wrote:
>i'll check it out.  i'm compiling ksymoops now, is there a way to get it to
>work without a static libbfd?  all i've got is a libbfd.so, and i'm going to
>need to recompile binutils if i must have a libbfd.a.

ksymoops/Makefile, change
$(CC) $(OBJECTS) $(CFLAGS) -Wl,-Bstatic -lbfd -liberty -Wl,-Bdynamic -o $@
to
$(CC) $(OBJECTS) $(CFLAGS) -lbfd -liberty -o $@

But you should have libbfd.a.  Debian splits binutils into binutils and
binutils-dev, libbfd.a might be in the latter.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Joseph Cheek

hi,

comments below.

Linus Torvalds wrote:

> In article <[EMAIL PROTECTED]>,
> Joseph Cheek  <[EMAIL PROTECTED]> wrote:
> >copying files off a loopback-mounted vfat filesystem exposes this bug.
> >test11 worked fine.
>
> It's not a new bug - it's an old bug that apparently is uncovered by a
> new stricter test.
>
> Apparently loopback unlocks an already unlocked page - which has always
> been a serious offense, but has never been detected before.
>
> test12-pre6+ detects it, and thus the BUG().
>
> Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
> it's impossible to debug, but it's already interesting information to
> see that it seems to be either loopback of vfat.
>
> Can you test some more? In particular, I'd love to hear if this happens
> with vfat even without loopback, or with loopback even without vfat
> (make an ext2 filesystem or similar instead). That woul dnarrow down the
> bug further.

this happens on loopbacked ext2, and not on regular vfat.  so it appears that
the culprit is loopback.  i got ksymoops working, here are the traces from
the vfat-over-loopback [first] and the ext2-over-loopback [second].

again, these are copied by hand, so i give it a 1% chance of
mistranscription.

ksymoops 2.3.5 on i686 2.4.0.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (default)
 -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

kernel BUG at buffer.c:827!
invalid operand: 
CPU: 0
EIP: 0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010082
eax: 001c ebx: c1d8fc60 ecx:  edx: 0001
esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0
ds: 0018 es: 0018 ss: 0018
Process cp (pid 762, stackpage=c1793000)
Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001
c01610e1
   c1d8fc60 0001 c1cef420  c1cef420 c02c8ed8 c88df91c
c1cef420
   0001 c88e0986 0007  0001 c02c8ed8 c02c8ee8
c4f18800
Call Trace: [] [] [] []
[] [] [] []
   [] [] [] [] []
[] [] [c0128720>]
   [] [] [c010b56b>]
Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00

>>EIP; c013660c<=
Trace; c01fe484 
Trace; c01fe95a 
Trace; c0130703 <__alloc_pages+df/2d4>
Trace; c8895de3 <[fat]fat_readpage+f/14>
Trace; c88df91c <[cdrom]cdrom_read_mech_status+c/4c>
Trace; c8894494 <[fat]fat_get_block+0/e4>
Trace; c0160d2f <__make_request+5cb/648>
Trace; c0160ead 
Trace; c0161011 
Trace; c0137a49 
Trace; c0130703 <__alloc_pages+df/2d4>
Trace; c8895de3 <[fat]fat_readpage+f/14>
Trace; c8894494 <[fat]fat_get_block+0/e4>
Trace; c01284d3 
Trace; c012887b 
Trace; c889448d <[fat]fat_file_read+2d/34>
Trace; c01349a7 
Code;  c013660c 
 <_EIP>:
Code;  c013660c<=
   0:   0f 0b ud2a  <=
Code;  c013660e 
   2:   83 c4 0c  add$0xc,%esp
Code;  c0136611 
   5:   8d 5e 28  lea0x28(%esi),%ebx
Code;  c0136614 
   8:   8d 46 2c  lea0x2c(%esi),%eax
Code;  c0136617 
   b:   39 46 2c  cmp%eax,0x2c(%esi)
Code;  c013661a 
   e:   74 24 je 34 <_EIP+0x34> c0136640

Code;  c013661c 
  10:   b9 01 00 00 00mov$0x1,%ecx


1 warning issued.  Results may not be reliable.

ksymoops 2.3.5 on i686 2.4.0.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (default)
 -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

kernel BUG at buffer.c:827!
invalid operand: 
CPU: 0
EIP: 0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010086
eax: 001c ebx: c1d212a0 ecx:  edx: 0001
esi: c11274bc edi: 0002 ebp: c1d212e8 esp: c4f1bddc
ds: 0018 es: 0018 ss: 0018
Process cp (pid 772, stackpage=c4f1b000)
Stack: c01fe484 c01fe95a 033b c1d21290 c1983420 0002 0001
c01610e1
   c1d212a0 0001 c1983420  c1983420 c02c8ed8 c88df91c
c1983420
   0001 c88e0986 0007  0002 c02c8ed8 c02c8ee8
c4cdf998
Call Trace: [] [] [] [] []
[] []
   [] [] [] [] []
[] [c0128720>]
   [] 

Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Joseph Cheek

i'll check it out.  i'm compiling ksymoops now, is there a way to get it to
work without a static libbfd?  all i've got is a libbfd.so, and i'm going to
need to recompile binutils if i must have a libbfd.a.

Linus Torvalds wrote:

> Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
> it's impossible to debug, but it's already interesting information to
> see that it seems to be either loopback of vfat.
>
> Can you test some more? In particular, I'd love to hear if this happens
> with vfat even without loopback, or with loopback even without vfat
> (make an ext2 filesystem or similar instead). That woul dnarrow down the
> bug further.

thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare.  Support for the Revolution.  | [EMAIL PROTECTED]
CTO / Acting PM, Redmond Linux Project   | [EMAIL PROTECTED]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Joseph Cheek  <[EMAIL PROTECTED]> wrote:
>copying files off a loopback-mounted vfat filesystem exposes this bug.
>test11 worked fine.

It's not a new bug - it's an old bug that apparently is uncovered by a
new stricter test.

Apparently loopback unlocks an already unlocked page - which has always
been a serious offense, but has never been detected before.

test12-pre6+ detects it, and thus the BUG().

Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
it's impossible to debug, but it's already interesting information to
see that it seems to be either loopback of vfat.

Can you test some more? In particular, I'd love to hear if this happens
with vfat even without loopback, or with loopback even without vfat
(make an ext2 filesystem or similar instead). That woul dnarrow down the
bug further.

Thanks,
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Keith Owens

On Thu, 07 Dec 2000 14:42:38 -0800, 
Joseph Cheek <[EMAIL PROTECTED]> wrote:
>loop.o built as module.  this hard crashes the machine, every time
>[PIII-450].  i don't know how to debug this, is there a FAQ?

linux/Documentation/oops-tracing.txt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Joseph Cheek

copying files off a loopback-mounted vfat filesystem exposes this bug.
test11 worked fine.

loop.o built as module.  this hard crashes the machine, every time
[PIII-450].  i don't know how to debug this, is there a FAQ?

[transcribed by hand]:

# mount -o loop /tmp/cdboot.288 /mnt/cd
# cd /mnt/cd
# cp menu.lst /tmp
kernel BUG at buffer.c:827!
invalid operand: 
CPU: 0
EIP: 0010:[]
EFLAGS: 00010082
eax: 001c ebx: c1d8fc60 ecx:  edx: 0001
esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0
ds: 0018 es: 0018 ss: 0018
Process cp (pid 762, stackpage=c1793000)
Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001
c01610e1
   c1d8fc60 0001 c1cef420  c1cef420 c02c8ed8 c88df91c
c1cef420
   0001 c88e0986 0007  0001 c02c8ed8 c02c8ee8
c4f18800
Call Trace: [] [] [] []
[] [] [] []
   [] [] [] [] []
[] [] [c0128720>]
   [] [] [c010b56b>]
Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00

as soon as i reboot i will look what's at buffer.c:827



--
thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare.  Support for the Revolution.  | [EMAIL PROTECTED]
CTO / Acting PM, Redmond Linux Project   | [EMAIL PROTECTED]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Joseph Cheek

copying files off a loopback-mounted vfat filesystem exposes this bug.
test11 worked fine.

loop.o built as module.  this hard crashes the machine, every time
[PIII-450].  i don't know how to debug this, is there a FAQ?

[transcribed by hand]:

# mount -o loop /tmp/cdboot.288 /mnt/cd
# cd /mnt/cd
# cp menu.lst /tmp
kernel BUG at buffer.c:827!
invalid operand: 
CPU: 0
EIP: 0010:[c013660c]
EFLAGS: 00010082
eax: 001c ebx: c1d8fc60 ecx:  edx: 0001
esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0
ds: 0018 es: 0018 ss: 0018
Process cp (pid 762, stackpage=c1793000)
Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001
c01610e1
   c1d8fc60 0001 c1cef420  c1cef420 c02c8ed8 c88df91c
c1cef420
   0001 c88e0986 0007  0001 c02c8ed8 c02c8ee8
c4f18800
Call Trace: [c01fe484] [c01fe95a] [c0130703] [c8895de3]
[c88df91c] [c8894494] [c0160d2f] [c0160ead]
   [c0161011] [c0137a49] [c0130703] [c8895de3] [c8894494]
[c01284d3] [c012887b] [c0128720]
   [c889448d] [c01349a7] [c010b56b]
Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00

as soon as i reboot i will look what's at buffer.c:827



--
thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare.  Support for the Revolution.  | [EMAIL PROTECTED]
CTO / Acting PM, Redmond Linux Project   | [EMAIL PROTECTED]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Keith Owens

On Thu, 07 Dec 2000 14:42:38 -0800, 
Joseph Cheek [EMAIL PROTECTED] wrote:
loop.o built as module.  this hard crashes the machine, every time
[PIII-450].  i don't know how to debug this, is there a FAQ?

linux/Documentation/oops-tracing.txt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Joseph Cheek  [EMAIL PROTECTED] wrote:
copying files off a loopback-mounted vfat filesystem exposes this bug.
test11 worked fine.

It's not a new bug - it's an old bug that apparently is uncovered by a
new stricter test.

Apparently loopback unlocks an already unlocked page - which has always
been a serious offense, but has never been detected before.

test12-pre6+ detects it, and thus the BUG().

Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
it's impossible to debug, but it's already interesting information to
see that it seems to be either loopback of vfat.

Can you test some more? In particular, I'd love to hear if this happens
with vfat even without loopback, or with loopback even without vfat
(make an ext2 filesystem or similar instead). That woul dnarrow down the
bug further.

Thanks,
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Joseph Cheek

i'll check it out.  i'm compiling ksymoops now, is there a way to get it to
work without a static libbfd?  all i've got is a libbfd.so, and i'm going to
need to recompile binutils if i must have a libbfd.a.

Linus Torvalds wrote:

 Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
 it's impossible to debug, but it's already interesting information to
 see that it seems to be either loopback of vfat.

 Can you test some more? In particular, I'd love to hear if this happens
 with vfat even without loopback, or with loopback even without vfat
 (make an ext2 filesystem or similar instead). That woul dnarrow down the
 bug further.

thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare.  Support for the Revolution.  | [EMAIL PROTECTED]
CTO / Acting PM, Redmond Linux Project   | [EMAIL PROTECTED]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Joseph Cheek

hi,

comments below.

Linus Torvalds wrote:

 In article [EMAIL PROTECTED],
 Joseph Cheek  [EMAIL PROTECTED] wrote:
 copying files off a loopback-mounted vfat filesystem exposes this bug.
 test11 worked fine.

 It's not a new bug - it's an old bug that apparently is uncovered by a
 new stricter test.

 Apparently loopback unlocks an already unlocked page - which has always
 been a serious offense, but has never been detected before.

 test12-pre6+ detects it, and thus the BUG().

 Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
 it's impossible to debug, but it's already interesting information to
 see that it seems to be either loopback of vfat.

 Can you test some more? In particular, I'd love to hear if this happens
 with vfat even without loopback, or with loopback even without vfat
 (make an ext2 filesystem or similar instead). That woul dnarrow down the
 bug further.

this happens on loopbacked ext2, and not on regular vfat.  so it appears that
the culprit is loopback.  i got ksymoops working, here are the traces from
the vfat-over-loopback [first] and the ext2-over-loopback [second].

again, these are copied by hand, so i give it a 1% chance of
mistranscription.

ksymoops 2.3.5 on i686 2.4.0.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (default)
 -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

kernel BUG at buffer.c:827!
invalid operand: 
CPU: 0
EIP: 0010:[c013660c]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010082
eax: 001c ebx: c1d8fc60 ecx:  edx: 0001
esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0
ds: 0018 es: 0018 ss: 0018
Process cp (pid 762, stackpage=c1793000)
Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001
c01610e1
   c1d8fc60 0001 c1cef420  c1cef420 c02c8ed8 c88df91c
c1cef420
   0001 c88e0986 0007  0001 c02c8ed8 c02c8ee8
c4f18800
Call Trace: [c01fe484] [c01fe95a] [c0130703] [c8895de3]
[c88df91c] [c8894494] [c0160d2f] [c0160ead]
   [c0161011] [c0137a49] [c0130703] [c8895de3] [c8894494]
[c01284d3] [c012887b] [c0128720]
   [c889448d] [c01349a7] [c010b56b]
Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00

EIP; c013660c end_buffer_io_async+e0/11c   =
Trace; c01fe484 tvecs+4fc0/13968
Trace; c01fe95a tvecs+5496/13968
Trace; c0130703 __alloc_pages+df/2d4
Trace; c8895de3 [fat]fat_readpage+f/14
Trace; c88df91c [cdrom]cdrom_read_mech_status+c/4c
Trace; c8894494 [fat]fat_get_block+0/e4
Trace; c0160d2f __make_request+5cb/648
Trace; c0160ead generic_make_request+101/110
Trace; c0161011 ll_rw_block+155/1c4
Trace; c0137a49 block_read_full_page+1fd/2a8
Trace; c0130703 __alloc_pages+df/2d4
Trace; c8895de3 [fat]fat_readpage+f/14
Trace; c8894494 [fat]fat_get_block+0/e4
Trace; c01284d3 do_generic_file_read+337/584
Trace; c012887b generic_file_read+5b/74
Trace; c889448d [fat]fat_file_read+2d/34
Trace; c01349a7 sys_read+8f/c4
Code;  c013660c end_buffer_io_async+e0/11c
 _EIP:
Code;  c013660c end_buffer_io_async+e0/11c   =
   0:   0f 0b ud2a  =
Code;  c013660e end_buffer_io_async+e2/11c
   2:   83 c4 0c  add$0xc,%esp
Code;  c0136611 end_buffer_io_async+e5/11c
   5:   8d 5e 28  lea0x28(%esi),%ebx
Code;  c0136614 end_buffer_io_async+e8/11c
   8:   8d 46 2c  lea0x2c(%esi),%eax
Code;  c0136617 end_buffer_io_async+eb/11c
   b:   39 46 2c  cmp%eax,0x2c(%esi)
Code;  c013661a end_buffer_io_async+ee/11c
   e:   74 24 je 34 _EIP+0x34 c0136640
end_buffer_io_async+114/11c
Code;  c013661c end_buffer_io_async+f0/11c
  10:   b9 01 00 00 00mov$0x1,%ecx


1 warning issued.  Results may not be reliable.

ksymoops 2.3.5 on i686 2.4.0.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (default)
 -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

kernel BUG at buffer.c:827!
invalid operand: 
CPU: 0
EIP: 0010:[c013660c]
Using defaults from ksymoops -t 

Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Keith Owens

On Thu, 07 Dec 2000 17:23:51 -0800, 
Joseph Cheek [EMAIL PROTECTED] wrote:
i'll check it out.  i'm compiling ksymoops now, is there a way to get it to
work without a static libbfd?  all i've got is a libbfd.so, and i'm going to
need to recompile binutils if i must have a libbfd.a.

ksymoops/Makefile, change
$(CC) $(OBJECTS) $(CFLAGS) -Wl,-Bstatic -lbfd -liberty -Wl,-Bdynamic -o $@
to
$(CC) $(OBJECTS) $(CFLAGS) -lbfd -liberty -o $@

But you should have libbfd.a.  Debian splits binutils into binutils and
binutils-dev, libbfd.a might be in the latter.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel BUG at buffer.c:827 in test12-pre6 and 7

2000-12-07 Thread Tom Leete

Linus Torvalds wrote:
 
 It's not a new bug - it's an old bug that apparently is uncovered by a
 new stricter test.
 
 Apparently loopback unlocks an already unlocked page - which has always
 been a serious offense, but has never been detected before.
 
 test12-pre6+ detects it, and thus the BUG().
 
 Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so
 it's impossible to debug, but it's already interesting information to
 see that it seems to be either loopback of vfat.
 
 Can you test some more? In particular, I'd love to hear if this happens
 with vfat even without loopback, or with loopback even without vfat
 (make an ext2 filesystem or similar instead). That woul dnarrow down the
 bug further.
 
 Thanks,
 Linus

Hi,

Here is a rather different datapoint. I hope it's different enough to help
nail this.

test12-pre5 + kdb + Serial Console. Sorry, I didn't get the contents of
pointer args.

It's probably worth saying that this kernel was compiled with gcc 2.95.2. I
have the blessed egcs also, will compile pre7 with that and see what
happens.

Cheers,
Tom

from nfs mounted ext2 (2.4.0-test12-pre5):
kernel BUG at buffer.c:827!

Entering kdb (current=0xc2014000, pid 466) Panic: invalid operand
due to panic @ 0xc0130c73
eax = 0x001c ebx = 0xc109ebf8 ecx = 0x edx = 0x0006 
esi = 0xc2739af0 edi = 0xc2739b38 esp = 0xc2015dd4 eip = 0xc0130c73 
ebp = 0xc2015df4 xss = 0x0018 xcs = 0xc2010010 eflags = 0x00010046 
xds = 0x0018 xes = 0x0018 origeax = 0x regs = 0xc2015da0
kdb bt
EBP   EIP Function(args)
0xc2015df4 0xc0130c73 end_buffer_io_async+0xd3 (0xc2739af0, 0x1)
   kernel .text 0xc010 0xc0130ba0 0xc0130cb0
0xc2015e10 0xc0164756 end_that_request_first+0x66 (0xc11c2c20, 0x1,
0xc031cf04)
   kernel .text 0xc010 0xc01646f0 0xc01647b0
0xc2015e30 0xc018a128 ide_end_request+0x28 (0x1, 0xc11c5060)
   kernel .text 0xc010 0xc018a100 0xc018a180
0xc2015e64 0xc018e614 read_intr+0x104 (0xc031ce20)
   kernel .text 0xc010 0xc018e510 0xc018e650
0xc2015e88 0xc018bad6 ide_intr+0x106 (0xe, 0xc11c5060, 0xc2015ed4, 0x1c0)
   kernel .text 0xc010 0xc018b9d0 0xc018bb30
0xc2015ea8 0xc010ab50 handle_IRQ_event+0x30 (0xe, 0xc2015ed4, 0xc11de2e0)
   kernel .text 0xc010 0xc010ab20 0xc010ab80
0xc2015ecc 0xc010ace2 do_IRQ+0x72 (0xc02f4520, 0xc02a92ac, 0xc201c000,
0xc201c000, 0xfc18)
   kernel .text 0xc010 0xc010ac70 0xc010ad30
   0xc01093f0 ret_from_intr
   kernel .text 0xc010 0xc01093f0 0xc0109410
Interrupt registers:
eax = 0x0019 ebx = 0xc02f4520 ecx = 0xc02a92ac edx = 0xc201c000 
esi = 0xc201c000 edi = 0xfc18 esp = 0xc2015f08 eip = 0xc0115816 
ebp = 0xc2015f4c xss = 0x0018 xcs = 0xc010 eflags = 0x0287 
xds = 0xc2070018 xes = 0xc2070018 origeax = 0xff0e regs = 0xc2015ed4 
   0xc0115816 schedule+0x1b6
   kernel .text 0xc010 0xc0115660 0xc0115b00
0xc2015f70 0xc01155c7 schedule_timeout+0x17 (0xc2014000, 0x1785222)
   kernel .text 0xc010 0xc01155b0 0xc0115650
0xc2015fac 0xc4055753 [sunrpc]svc_recv+0x1a3 (0xc24b2470, 0xc207be00,
0x7fff)
   sunrpc .text 0xc404e060 0xc40555b0 0xc4055940
0xc2015fec 0xc40713f3 [nfsd]nfsd+0x253
   nfsd .text 0xc4071060 0xc40711a0 0xc40714a0
   0xc0107843 kernel_thread+0x23
   kernel .text 0xc010 0xc0107820 0xc0107860
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/