Re: [FIXED!] kernel BUG at buffer.c:827 in test12-pre6 and 7
this works fine in pre8. thanks all! Joseph Cheek wrote: > copying files off a loopback-mounted vfat filesystem exposes this bug. > test11 worked fine. > > loop.o built as module. this hard crashes the machine, every time > [PIII-450]. i don't know how to debug this, is there a FAQ? > > [transcribed by hand]: > > # mount -o loop /tmp/cdboot.288 /mnt/cd > # cd /mnt/cd > # cp menu.lst /tmp > kernel BUG at buffer.c:827! > invalid operand: > CPU: 0 > EIP: 0010:[] > EFLAGS: 00010082 > eax: 001c ebx: c1d8fc60 ecx: edx: 0001 > esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0 > ds: 0018 es: 0018 ss: 0018 > Process cp (pid 762, stackpage=c1793000) > Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001 > c01610e1 >c1d8fc60 0001 c1cef420 c1cef420 c02c8ed8 c88df91c > c1cef420 >0001 c88e0986 0007 0001 c02c8ed8 c02c8ee8 > c4f18800 > Call Trace: [] [] [] [] > [] [] [] [] >[] [] [] [] [] > [] [] [c0128720>] >[] [] [c010b56b>] > Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00 > > as soon as i reboot i will look what's at buffer.c:827 thanks! joe -- Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/ Linuxcare. Support for the Revolution. | [EMAIL PROTECTED] CTO / Acting PM, Redmond Linux Project | [EMAIL PROTECTED] 425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [FIXED!] kernel BUG at buffer.c:827 in test12-pre6 and 7
this works fine in pre8. thanks all! Joseph Cheek wrote: copying files off a loopback-mounted vfat filesystem exposes this bug. test11 worked fine. loop.o built as module. this hard crashes the machine, every time [PIII-450]. i don't know how to debug this, is there a FAQ? [transcribed by hand]: # mount -o loop /tmp/cdboot.288 /mnt/cd # cd /mnt/cd # cp menu.lst /tmp kernel BUG at buffer.c:827! invalid operand: CPU: 0 EIP: 0010:[c013660c] EFLAGS: 00010082 eax: 001c ebx: c1d8fc60 ecx: edx: 0001 esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0 ds: 0018 es: 0018 ss: 0018 Process cp (pid 762, stackpage=c1793000) Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001 c01610e1 c1d8fc60 0001 c1cef420 c1cef420 c02c8ed8 c88df91c c1cef420 0001 c88e0986 0007 0001 c02c8ed8 c02c8ee8 c4f18800 Call Trace: [c01fe484] [c01fe95a] [c0130703] [c8895de3] [c88df91c] [c8894494] [c0160d2f] [c0160ead] [c0161011] [c0137a49] [c0130703] [c8895de3] [c8894494] [c01284d3] [c012887b] [c0128720] [c889448d] [c01349a7] [c010b56b] Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00 as soon as i reboot i will look what's at buffer.c:827 thanks! joe -- Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/ Linuxcare. Support for the Revolution. | [EMAIL PROTECTED] CTO / Acting PM, Redmond Linux Project | [EMAIL PROTECTED] 425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Mikulas Patocka wrote: > > I did a test. I disabled readahead except for reading all 4 buffers in > map_4sectors. > > I observed 14% slowdown on walking directories with find and 4% slowdown > on grepping one my working directory (10M, 281 files). > > If you can't make it otherwise you can rip readahead out. If there is a > possibility to keep it, it would be better. The absolutely best thing would be to keep the directories in the page cache. At which point the whole issue becomes pretty moot and we could use the page-cache readahead code. Which gets the end right, and can handle stuff that isn't physically contiguous on disk. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Alexander Viro wrote: > On Sat, 9 Dec 2000, Andries Brouwer wrote: > > > On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote: > > > > > > @@ -1210,7 +1204,6 @@ > > > [breada()] > > > Umm... why do we keep it, in the first place? AFAICS the only > > > in-tree user is hpfs_map_sector() and it doesn't look like we really > > > need it there. OTOH, trimming the buffer.c down is definitely nice. > > > Mikulas? > > > > Throw it out. The number of users has diminished over time. > > Recently isofs stopped using breada. > > The hpfs use was broken, I fixed it a bit some time ago, but > > there is nothing against throwing it out altogether, I think. > > I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry) > and it looks like we would be better off doing getblk() on affected sectors > and ll_rw_block() on the whole bunch - we end up calling breada() for > increasing block numbers with decreasing readahead window anyway. > > So it probably should go - it gives no real win. Mikulas has the final > word here - he is the HPFS maintainer, so... I did a test. I disabled readahead except for reading all 4 buffers in map_4sectors. I observed 14% slowdown on walking directories with find and 4% slowdown on grepping one my working directory (10M, 281 files). If you can't make it otherwise you can rip readahead out. If there is a possibility to keep it, it would be better. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Linus Torvalds <[EMAIL PROTECTED]> writes: > > Modulo the comments above - fine with me. However, there is stuff in > > drivers/md that I don't understand. Ingo, could you comment on the use of > > ->b_end_io there? > > I already sent him mail about it for the same reason. How about sending mail to all the journaling FS groups too? -- the change is surely to break all the JFS which use usually b_end_io to submit the data after the journal has been flushed. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Alexander Viro wrote: > Fine > > - atomic_inc(>b_count); > > Why? It's cleaner the old way - why bother postponing that until we > lock the thing? I had a rule: we always do the "lock_buffer()" and "b_count increment" together with setting "b_end_io = end_buffer_io_async". Why? Because that way we pair up the actions, and I could easily verify that every single user of "end_buffer_io_async" will increment the count (that is decremented in "end_buffer_io_async"). We never used to have any rules in this area, and it was sometimes hard to match up the actions with each other. > > int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size) > > > if (!page->buffers) > > + create_page_buffers(rw, page, dev, b, size); > > create_empty_buffers(page, dev, size); Agreed. > Modulo the comments above - fine with me. However, there is stuff in > drivers/md that I don't understand. Ingo, could you comment on the use of > ->b_end_io there? I already sent him mail about it for the same reason. > Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't > even check for buffers being mapped, let alone attempt to map them. > Fortunately, ext2 doesn't use it these days, but the rest of block > filesystems... WTF? fsync() merrily ignores mmap()'ed stuff? fsync() has _always_ ignored mmap'ed stuff. If you want to get your mappings synchronized, you absolutely positively have to call "msync()". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: > BTW, what do you think about the following: > * add_to_page_cache() is not exported and never used. Kill? I have my eye on that for execute-in-place of romfs from real ROM chips - making up struct pages for parts of the ROM chips and inserting them into the page cache. I'd rather you didn't remove it :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Date:Sat, 9 Dec 2000 00:45:51 -0800 (PST) From: Linus Torvalds <[EMAIL PROTECTED]> out: -if (nr) { -ll_rw_block(WRITE, nr, arr); -} else { -UnlockPage(page); -} +UnlockPage(page); ClearPageUptodate(page); return err; } @@ -1669,7 +1665,6 @@ It would seem that you would want to unlock the page _after_ clearing the uptodate bit to make sure people sleeping on the page do not see it set by accident. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Andries Brouwer wrote: > On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote: > > > > @@ -1210,7 +1204,6 @@ > > [breada()] > > Umm... why do we keep it, in the first place? AFAICS the only > > in-tree user is hpfs_map_sector() and it doesn't look like we really > > need it there. OTOH, trimming the buffer.c down is definitely nice. > > Mikulas? > > Throw it out. The number of users has diminished over time. > Recently isofs stopped using breada. > The hpfs use was broken, I fixed it a bit some time ago, but > there is nothing against throwing it out altogether, I think. I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry) and it looks like we would be better off doing getblk() on affected sectors and ll_rw_block() on the whole bunch - we end up calling breada() for increasing block numbers with decreasing readahead window anyway. So it probably should go - it gives no real win. Mikulas has the final word here - he is the HPFS maintainer, so... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote: > > @@ -1210,7 +1204,6 @@ > [breada()] > Umm... why do we keep it, in the first place? AFAICS the only > in-tree user is hpfs_map_sector() and it doesn't look like we really > need it there. OTOH, trimming the buffer.c down is definitely nice. > Mikulas? Throw it out. The number of users has diminished over time. Recently isofs stopped using breada. The hpfs use was broken, I fixed it a bit some time ago, but there is nothing against throwing it out altogether, I think. Andries - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Linus Torvalds wrote: > This is a preliminary patch that I have not compiled and probably breaks, > but you get the idea. I'm going to sleep, to survive another night with > three small kids. > > If somebody wants to run with this, please try it out, but realize that > it's a work-in-progress. And realize that bugs in this area tend to > corrupt filesystems very quickly indeed. I would strongly advice against > trying this patch out without really grokking what it does and feeling > confident that it actually works. > > NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried > to make all the changes truly straightforward and obvious. I want this bug > squashed, and I don't want to see it again. But I _still_ think this is a > dangerous patch until somebody like Al has given it a green light. Caveat > Emptor. > @@ -1001,7 +995,7 @@ >* and it is clean. >*/ > if (bh) { > - init_buffer(bh, end_buffer_io_sync, NULL); > + init_buffer(bh, end_buffer_io_bad, NULL); How about NULL? > @@ -1210,7 +1204,6 @@ [breada()] Umm... why do we keep it, in the first place? AFAICS the only in-tree user is hpfs_map_sector() and it doesn't look like we really need it there. OTOH, trimming the buffer.c down is definitely nice. Mikulas? > @@ -1439,7 +1431,7 @@ [create_page_buffers()] Linus, compare the result with create_empty_buffers(). Then look at the only caller and notice that it will merrily loop over these buffer_heads. IOW, I propose to mark them mapped and set the ->b_blocknr in brw_page(), replace create_page_buffers() call with create_empty_buffers() and let create_page_buffers() go to hell. Oh, and let create_empty_buffers() take device as an argument, not inode as it does now. > @@ -1600,6 +1590,8 @@ [__block_write_full_page()] > > bh = head; > i = 0; > + > + /* Stage 1: make sure we have all the buffers mapped! */ > do { > /* >* If the buffer isn't up-to-date, we can't be sure Ahem. Think what will happen if we are sitting over the hole in file, map some buffers and fail in the middle of the fun. Notice that we do not submit any IO requests in that case, i.e. we just had created a random crap in file. More recovery needed here. I would just break out of the mapping loop, then proceed as above, but limited the activity to mapped blocks only. And did if (err) ClearPageUptodate(page) in the end. > @@ -1669,7 +1665,6 @@ [__block_prepare_write()] Same problem with recovery - see the patch I've sent recently (handling get_block() failures). > @@ -1793,35 +1787,40 @@ [block_read_full_page()] > - init_buffer(bh, end_buffer_io_async, NULL); Fine > - atomic_inc(>b_count); Why? It's cleaner the old way - why bother postponing that until we lock the thing? > + /* Stage two: lock the buffers */ > + for (i = 0; i < nr; i++) { > + struct buffer_head * bh = arr[i]; > + lock_buffer(bh); > + bh->b_end_io = end_buffer_io_async; > + atomic_inc(>b_count); See above. > @@ -2263,67 +2261,31 @@ > */ > int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size) > if (!page->buffers) > + create_page_buffers(rw, page, dev, b, size); create_empty_buffers(page, dev, size); > + > + head = bh = page->buffers; > + if (!head) > BUG(); > > - head = page->buffers; > - bh = head; > - nr = 0; > + /* Stage 1: lock all the buffers */ > do { > - block = *(b++); > + lock_buffer(bh); > + bh->b_end_io = end_buffer_io_async; bh->b_blocknr = *(b++); set_bit(BH_Mapped, >b_state); Modulo the comments above - fine with me. However, there is stuff in drivers/md that I don't understand. Ingo, could you comment on the use of ->b_end_io there? Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't even check for buffers being mapped, let alone attempt to map them. Fortunately, ext2 doesn't use it these days, but the rest of block filesystems... WTF? fsync() merrily ignores mmap()'ed stuff? I mean, we can mmap() an area, dirty the bloody thing, call fsync() and get zero traffic. Hmm... OTOH, it might be correct behaviour... Anyway, fsync() on block filesystems other than ext2 needs fixing. Badly. I'll port Stephen's patch to them. IMO patch is mostly safe (the worst part is error recovery on block_write_full_page()), except the writeout_one_page() part and possibly the RAID stuff. Linus, Ingo, Mikulas - comments? Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED]
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Linus Torvalds wrote: > > This is a preliminary patch that I have not compiled and probably breaks, > but you get the idea. I'm going to sleep, to survive another night with > three small kids. Call me stupid [ Chorus: "You're stupid, Linus" ], but I actually compiled and booted this remotely. And it came up. Which is a pretty good sign that it doesn't have anything really broken in it. Still, it would be good to have a few other sharp eyes looking it over. Look sclean and obviously correct to me, but then _everything_ I write always looks obviously correct yo me. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: > On Fri, 8 Dec 2000, Linus Torvalds wrote: > > > Looking more at this issue, I suspect that the easiest pretty solution > > that everybody can probably agree is reasonable is to either pass down the > > end-of-io callback to ll_rw_block as you suggested, or, preferably by just > > forcing the _caller_ to do the buffer locking, and just do the b_end_io > > stuff inside the buffer lock and get rid of all the races that way > > instead (and make ll_rw_block() verify that the buffers it is passed are > > always locked). > > Hmm... I've looked through the ll_rw_block() callers and yes, it seems > to be doable. Looking at this, there's a _lot_ of them. I've taken a test-approach that is fairly simple: - get rid of the old "ll_rw_block()", because it was inherently racey wrt bh->b_end_io, I think we all agree that changing bh->b_end_io without holding any locks at all is fairly dangerous. - instead, a simple "submit_bh()" thing, that takes only one locked buffer head at a time, and queues it for IO. The usage would basically be lock_buffer(bh); bh->b_end_io = completion_callback; submit_bh(READ|WRITE, bh); which is a pretty clean and simple interface that has no obvious races - submit_bh() will set bh->b_end_io to the completion callback. - BUT BUT BUT: Because of tons of old users of ll_rw_block(), we introduce a totally new ll_rw_block() that has the same old interface, but basically does void ll_rw_block(int op, int nr, struct buffer_head **array) { int i; for (i = 0; i < nr; i++) { struct buffer_head * bh = array[i]; lock_buffer(bh); bh->b_end_io = end_buffer_io_sync; submit_bh(op, bh); } } Again, the above avoids the race (we never touch b_end_io except with the buffer lock held), and allows all old uses of "ll_rw_block()" _except_ the ones that wanted to do the fancy async callbacks. The advantage? All the regular old code that isn't fancy (ie the low-level filesystems, bread(), breada() etc) will get a working ll_rw_block() with the semantics they want, and we can pretty much prove that they can never get an async handler even by mistake. And the (few) clever routines in fs/buffer.c that currently use ll_rw_block() with async handlers can just be converted to use the submit_bh() interface. This is a preliminary patch that I have not compiled and probably breaks, but you get the idea. I'm going to sleep, to survive another night with three small kids. If somebody wants to run with this, please try it out, but realize that it's a work-in-progress. And realize that bugs in this area tend to corrupt filesystems very quickly indeed. I would strongly advice against trying this patch out without really grokking what it does and feeling confident that it actually works. NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried to make all the changes truly straightforward and obvious. I want this bug squashed, and I don't want to see it again. But I _still_ think this is a dangerous patch until somebody like Al has given it a green light. Caveat Emptor. Linus - diff -u --recursive t12p7/linux/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c --- t12p7/linux/drivers/block/ll_rw_blk.c Thu Dec 7 15:56:25 2000 +++ linux/drivers/block/ll_rw_blk.c Sat Dec 9 00:40:35 2000 @@ -885,6 +885,36 @@ while (q->make_request_fn(q, rw, bh)); } + +/* + * Submit a buffer head for IO. + */ +void submit_bh(int rw, struct buffer_head * bh) +{ + if (!test_bit(BH_Lock, >b_state)) + BUG(); + + set_bit(BH_Req, >b_state); + + /* +* First step, 'identity mapping' - RAID or LVM might +* further remap this. +*/ + bh->b_rdev = bh->b_dev; + bh->b_rsector = bh->b_blocknr * (bh->b_size>>9); + + generic_make_request(rw, bh); +} + +/* + * Default IO end handler, used by "ll_rw_block()". + */ +static void end_buffer_io_sync(struct buffer_head *bh, int uptodate) +{ + mark_buffer_uptodate(bh, uptodate); + unlock_buffer(bh); +} + /* This function can be used to request a number of buffers from a block device. Currently the only restriction is that all buffers must belong to the same device */ @@ -931,7 +961,8 @@ if (test_and_set_bit(BH_Lock, >b_state)) continue; - set_bit(BH_Req, >b_state); + /* We have the buffer lock */ + bh->b_end_io = end_buffer_io_sync; switch(rw) { case WRITE: @@ -954,17 +985,9 @@ end_io: bh->b_end_io(bh, test_bit(BH_Uptodate, >b_state)); continue; - }
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: On Fri, 8 Dec 2000, Linus Torvalds wrote: Looking more at this issue, I suspect that the easiest pretty solution that everybody can probably agree is reasonable is to either pass down the end-of-io callback to ll_rw_block as you suggested, or, preferably by just forcing the _caller_ to do the buffer locking, and just do the b_end_io stuff inside the buffer lock and get rid of all the races that way instead (and make ll_rw_block() verify that the buffers it is passed are always locked). Hmm... I've looked through the ll_rw_block() callers and yes, it seems to be doable. Looking at this, there's a _lot_ of them. I've taken a test-approach that is fairly simple: - get rid of the old "ll_rw_block()", because it was inherently racey wrt bh-b_end_io, I think we all agree that changing bh-b_end_io without holding any locks at all is fairly dangerous. - instead, a simple "submit_bh()" thing, that takes only one locked buffer head at a time, and queues it for IO. The usage would basically be lock_buffer(bh); bh-b_end_io = completion_callback; submit_bh(READ|WRITE, bh); which is a pretty clean and simple interface that has no obvious races - submit_bh() will set bh-b_end_io to the completion callback. - BUT BUT BUT: Because of tons of old users of ll_rw_block(), we introduce a totally new ll_rw_block() that has the same old interface, but basically does void ll_rw_block(int op, int nr, struct buffer_head **array) { int i; for (i = 0; i nr; i++) { struct buffer_head * bh = array[i]; lock_buffer(bh); bh-b_end_io = end_buffer_io_sync; submit_bh(op, bh); } } Again, the above avoids the race (we never touch b_end_io except with the buffer lock held), and allows all old uses of "ll_rw_block()" _except_ the ones that wanted to do the fancy async callbacks. The advantage? All the regular old code that isn't fancy (ie the low-level filesystems, bread(), breada() etc) will get a working ll_rw_block() with the semantics they want, and we can pretty much prove that they can never get an async handler even by mistake. And the (few) clever routines in fs/buffer.c that currently use ll_rw_block() with async handlers can just be converted to use the submit_bh() interface. This is a preliminary patch that I have not compiled and probably breaks, but you get the idea. I'm going to sleep, to survive another night with three small kids. If somebody wants to run with this, please try it out, but realize that it's a work-in-progress. And realize that bugs in this area tend to corrupt filesystems very quickly indeed. I would strongly advice against trying this patch out without really grokking what it does and feeling confident that it actually works. NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried to make all the changes truly straightforward and obvious. I want this bug squashed, and I don't want to see it again. But I _still_ think this is a dangerous patch until somebody like Al has given it a green light. Caveat Emptor. Linus - diff -u --recursive t12p7/linux/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c --- t12p7/linux/drivers/block/ll_rw_blk.c Thu Dec 7 15:56:25 2000 +++ linux/drivers/block/ll_rw_blk.c Sat Dec 9 00:40:35 2000 @@ -885,6 +885,36 @@ while (q-make_request_fn(q, rw, bh)); } + +/* + * Submit a buffer head for IO. + */ +void submit_bh(int rw, struct buffer_head * bh) +{ + if (!test_bit(BH_Lock, bh-b_state)) + BUG(); + + set_bit(BH_Req, bh-b_state); + + /* +* First step, 'identity mapping' - RAID or LVM might +* further remap this. +*/ + bh-b_rdev = bh-b_dev; + bh-b_rsector = bh-b_blocknr * (bh-b_size9); + + generic_make_request(rw, bh); +} + +/* + * Default IO end handler, used by "ll_rw_block()". + */ +static void end_buffer_io_sync(struct buffer_head *bh, int uptodate) +{ + mark_buffer_uptodate(bh, uptodate); + unlock_buffer(bh); +} + /* This function can be used to request a number of buffers from a block device. Currently the only restriction is that all buffers must belong to the same device */ @@ -931,7 +961,8 @@ if (test_and_set_bit(BH_Lock, bh-b_state)) continue; - set_bit(BH_Req, bh-b_state); + /* We have the buffer lock */ + bh-b_end_io = end_buffer_io_sync; switch(rw) { case WRITE: @@ -954,17 +985,9 @@ end_io: bh-b_end_io(bh, test_bit(BH_Uptodate, bh-b_state)); continue; - } - /* -
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Linus Torvalds wrote: This is a preliminary patch that I have not compiled and probably breaks, but you get the idea. I'm going to sleep, to survive another night with three small kids. Call me stupid [ Chorus: "You're stupid, Linus" ], but I actually compiled and booted this remotely. And it came up. Which is a pretty good sign that it doesn't have anything really broken in it. Still, it would be good to have a few other sharp eyes looking it over. Look sclean and obviously correct to me, but then _everything_ I write always looks obviously correct yo me. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Linus Torvalds wrote: This is a preliminary patch that I have not compiled and probably breaks, but you get the idea. I'm going to sleep, to survive another night with three small kids. If somebody wants to run with this, please try it out, but realize that it's a work-in-progress. And realize that bugs in this area tend to corrupt filesystems very quickly indeed. I would strongly advice against trying this patch out without really grokking what it does and feeling confident that it actually works. NOTE NOTE NOTE! I've tried to keep the patch small and simple. I've tried to make all the changes truly straightforward and obvious. I want this bug squashed, and I don't want to see it again. But I _still_ think this is a dangerous patch until somebody like Al has given it a green light. Caveat Emptor. @@ -1001,7 +995,7 @@ * and it is clean. */ if (bh) { - init_buffer(bh, end_buffer_io_sync, NULL); + init_buffer(bh, end_buffer_io_bad, NULL); How about NULL? @@ -1210,7 +1204,6 @@ [breada()] Umm... why do we keep it, in the first place? AFAICS the only in-tree user is hpfs_map_sector() and it doesn't look like we really need it there. OTOH, trimming the buffer.c down is definitely nice. Mikulas? @@ -1439,7 +1431,7 @@ [create_page_buffers()] Linus, compare the result with create_empty_buffers(). Then look at the only caller and notice that it will merrily loop over these buffer_heads. IOW, I propose to mark them mapped and set the -b_blocknr in brw_page(), replace create_page_buffers() call with create_empty_buffers() and let create_page_buffers() go to hell. Oh, and let create_empty_buffers() take device as an argument, not inode as it does now. @@ -1600,6 +1590,8 @@ [__block_write_full_page()] bh = head; i = 0; + + /* Stage 1: make sure we have all the buffers mapped! */ do { /* * If the buffer isn't up-to-date, we can't be sure Ahem. Think what will happen if we are sitting over the hole in file, map some buffers and fail in the middle of the fun. Notice that we do not submit any IO requests in that case, i.e. we just had created a random crap in file. More recovery needed here. I would just break out of the mapping loop, then proceed as above, but limited the activity to mapped blocks only. And did if (err) ClearPageUptodate(page) in the end. @@ -1669,7 +1665,6 @@ [__block_prepare_write()] Same problem with recovery - see the patch I've sent recently (handling get_block() failures). @@ -1793,35 +1787,40 @@ [block_read_full_page()] - init_buffer(bh, end_buffer_io_async, NULL); Fine - atomic_inc(bh-b_count); Why? It's cleaner the old way - why bother postponing that until we lock the thing? + /* Stage two: lock the buffers */ + for (i = 0; i nr; i++) { + struct buffer_head * bh = arr[i]; + lock_buffer(bh); + bh-b_end_io = end_buffer_io_async; + atomic_inc(bh-b_count); See above. @@ -2263,67 +2261,31 @@ */ int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size) if (!page-buffers) + create_page_buffers(rw, page, dev, b, size); create_empty_buffers(page, dev, size); + + head = bh = page-buffers; + if (!head) BUG(); - head = page-buffers; - bh = head; - nr = 0; + /* Stage 1: lock all the buffers */ do { - block = *(b++); + lock_buffer(bh); + bh-b_end_io = end_buffer_io_async; bh-b_blocknr = *(b++); set_bit(BH_Mapped, bh-b_state); Modulo the comments above - fine with me. However, there is stuff in drivers/md that I don't understand. Ingo, could you comment on the use of -b_end_io there? Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't even check for buffers being mapped, let alone attempt to map them. Fortunately, ext2 doesn't use it these days, but the rest of block filesystems... doubletake WTF? fsync() merrily ignores mmap()'ed stuff? I mean, we can mmap() an area, dirty the bloody thing, call fsync() and get zero traffic. Hmm... OTOH, it might be correct behaviour... Anyway, fsync() on block filesystems other than ext2 needs fixing. Badly. I'll port Stephen's patch to them. IMO patch is mostly safe (the worst part is error recovery on block_write_full_page()), except the writeout_one_page() part and possibly the RAID stuff. Linus, Ingo, Mikulas - comments? Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote: @@ -1210,7 +1204,6 @@ [breada()] Umm... why do we keep it, in the first place? AFAICS the only in-tree user is hpfs_map_sector() and it doesn't look like we really need it there. OTOH, trimming the buffer.c down is definitely nice. Mikulas? Throw it out. The number of users has diminished over time. Recently isofs stopped using breada. The hpfs use was broken, I fixed it a bit some time ago, but there is nothing against throwing it out altogether, I think. Andries - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Andries Brouwer wrote: On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote: @@ -1210,7 +1204,6 @@ [breada()] Umm... why do we keep it, in the first place? AFAICS the only in-tree user is hpfs_map_sector() and it doesn't look like we really need it there. OTOH, trimming the buffer.c down is definitely nice. Mikulas? Throw it out. The number of users has diminished over time. Recently isofs stopped using breada. The hpfs use was broken, I fixed it a bit some time ago, but there is nothing against throwing it out altogether, I think. I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry) and it looks like we would be better off doing getblk() on affected sectors and ll_rw_block() on the whole bunch - we end up calling breada() for increasing block numbers with decreasing readahead window anyway. So it probably should go - it gives no real win. Mikulas has the final word here - he is the HPFS maintainer, so... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Date:Sat, 9 Dec 2000 00:45:51 -0800 (PST) From: Linus Torvalds [EMAIL PROTECTED] out: -if (nr) { -ll_rw_block(WRITE, nr, arr); -} else { -UnlockPage(page); -} +UnlockPage(page); ClearPageUptodate(page); return err; } @@ -1669,7 +1665,6 @@ It would seem that you would want to unlock the page _after_ clearing the uptodate bit to make sure people sleeping on the page do not see it set by accident. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: BTW, what do you think about the following: * add_to_page_cache() is not exported and never used. Kill? I have my eye on that for execute-in-place of romfs from real ROM chips - making up struct pages for parts of the ROM chips and inserting them into the page cache. I'd rather you didn't remove it :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Alexander Viro wrote: Fine - atomic_inc(bh-b_count); Why? It's cleaner the old way - why bother postponing that until we lock the thing? I had a rule: we always do the "lock_buffer()" and "b_count increment" together with setting "b_end_io = end_buffer_io_async". Why? Because that way we pair up the actions, and I could easily verify that every single user of "end_buffer_io_async" will increment the count (that is decremented in "end_buffer_io_async"). We never used to have any rules in this area, and it was sometimes hard to match up the actions with each other. int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size) if (!page-buffers) + create_page_buffers(rw, page, dev, b, size); create_empty_buffers(page, dev, size); Agreed. Modulo the comments above - fine with me. However, there is stuff in drivers/md that I don't understand. Ingo, could you comment on the use of -b_end_io there? I already sent him mail about it for the same reason. Another bad thing is in mm/filemap.c::writeout_one_page() - it doesn't even check for buffers being mapped, let alone attempt to map them. Fortunately, ext2 doesn't use it these days, but the rest of block filesystems... doubletake WTF? fsync() merrily ignores mmap()'ed stuff? fsync() has _always_ ignored mmap'ed stuff. If you want to get your mappings synchronized, you absolutely positively have to call "msync()". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Linus Torvalds [EMAIL PROTECTED] writes: Modulo the comments above - fine with me. However, there is stuff in drivers/md that I don't understand. Ingo, could you comment on the use of -b_end_io there? I already sent him mail about it for the same reason. How about sending mail to all the journaling FS groups too? -- the change is surely to break all the JFS which use usually b_end_io to submit the data after the journal has been flushed. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Alexander Viro wrote: On Sat, 9 Dec 2000, Andries Brouwer wrote: On Sat, Dec 09, 2000 at 05:40:47AM -0500, Alexander Viro wrote: @@ -1210,7 +1204,6 @@ [breada()] Umm... why do we keep it, in the first place? AFAICS the only in-tree user is hpfs_map_sector() and it doesn't look like we really need it there. OTOH, trimming the buffer.c down is definitely nice. Mikulas? Throw it out. The number of users has diminished over time. Recently isofs stopped using breada. The hpfs use was broken, I fixed it a bit some time ago, but there is nothing against throwing it out altogether, I think. I've looked at the use of hpfs_map_sector() (and hpfs_map_4sectors() - sorry) and it looks like we would be better off doing getblk() on affected sectors and ll_rw_block() on the whole bunch - we end up calling breada() for increasing block numbers with decreasing readahead window anyway. So it probably should go - it gives no real win. Mikulas has the final word here - he is the HPFS maintainer, so... I did a test. I disabled readahead except for reading all 4 buffers in map_4sectors. I observed 14% slowdown on walking directories with find and 4% slowdown on grepping one my working directory (10M, 281 files). If you can't make it otherwise you can rip readahead out. If there is a possibility to keep it, it would be better. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Sat, 9 Dec 2000, Mikulas Patocka wrote: I did a test. I disabled readahead except for reading all 4 buffers in map_4sectors. I observed 14% slowdown on walking directories with find and 4% slowdown on grepping one my working directory (10M, 281 files). If you can't make it otherwise you can rip readahead out. If there is a possibility to keep it, it would be better. The absolutely best thing would be to keep the directories in the page cache. At which point the whole issue becomes pretty moot and we could use the page-cache readahead code. Which gets the end right, and can handle stuff that isn't physically contiguous on disk. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Linus Torvalds wrote: > Looking more at this issue, I suspect that the easiest pretty solution > that everybody can probably agree is reasonable is to either pass down the > end-of-io callback to ll_rw_block as you suggested, or, preferably by just > forcing the _caller_ to do the buffer locking, and just do the b_end_io > stuff inside the buffer lock and get rid of all the races that way > instead (and make ll_rw_block() verify that the buffers it is passed are > always locked). Hmm... I've looked through the ll_rw_block() callers and yes, it seems to be doable. BTW, we probably want a helper function that would do lock+ll_rw_block+wait - it will simplify the life in filesystems. I'll do a patch tonight. Cheers, Al PS: alpha-testers needed for sysvfs patches - right now the thing is racey as hell and unlike minixfs I can't test it myself. It's more or less straightforward port of minixfs patch that went into the tree sometime ago, so it shouldn't be too bad, but... IOW, if somebody has boxen to test the thing on - yell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: > > I'm quite aware of that fact ;-) However, you said > >On the other hand, I have this suspicion that there is an even simpler >solution: stop using the end_buffer_io_sync version for writes >altogether. > > If that happens (i.e. if write requests resulting from prepare_write()/ > commit_write()/bdflush sequence become async) we must stop unlocking pages > after commit_write(). Essentially it would become unlocker of the same > kind as readpage() and writepage() - callers must assume that page submitted > to commit_write() will eventually be unlocked. You're right, we can't do that for anonymous buffers right now. Mea culpa. Looking more at this issue, I suspect that the easiest pretty solution that everybody can probably agree is reasonable is to either pass down the end-of-io callback to ll_rw_block as you suggested, or, preferably by just forcing the _caller_ to do the buffer locking, and just do the b_end_io stuff inside the buffer lock and get rid of all the races that way instead (and make ll_rw_block() verify that the buffers it is passed are always locked). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Linus Torvalds wrote: > > > On Fri, 8 Dec 2000, Alexander Viro wrote: > > > > Erm... So you want to make ->commit_write() page-unlocking? Fine with me, > > but that will make for somewhat bigger patch. Hey, _you_ are in position > > to change the locking rules, freeze or not, so if it's OK with you... > > No. > > Read the code a bit more. > > commit_write() doesn't start any IO at all. It only marks the buffer > dirty. I'm quite aware of that fact ;-) However, you said On the other hand, I have this suspicion that there is an even simpler solution: stop using the end_buffer_io_sync version for writes altogether. If that happens (i.e. if write requests resulting from prepare_write()/ commit_write()/bdflush sequence become async) we must stop unlocking pages after commit_write(). Essentially it would become unlocker of the same kind as readpage() and writepage() - callers must assume that page submitted to commit_write() will eventually be unlocked. > The dirty flush-out works the way it has always worked. > > (You're right in that the dirty flusher should make sure to change > b_end_io when they do their write-outs - that code used to just depend on > the buffer always having b_end_io magically set) Hmm... IDGI. Do you want the request resulting from commit_write() ("resulting" != "issued") to stay sync (in that case we still need to change block_write_full_page() since the analysis stands - it dirties blocks unconditionally) or you want them to go async (in that case we need to change commit_write() callers)? Could you clarify? > Btw, I also think that the dirty buffer flushing should get the page lock. > Right now it touches the buffer list without holding the lock on the page > that the buffer is on, which means that there is really nothign that > prevents it from racing with the page-based writeout. I don't like that: > it does hold the LRU list lock, so the list state itself is ok, but it > does actually touch part of the buffer state that is not really protected > by the lock. > > I think it ends up being ok because ll_rw_block will ignore buffers that > get output twice, and the rest of the state is handled with atomic > operations (b_count and b_flags), but it's not really proper. If > flush_dirty_buffers() got the page lock, everything would be protected > properly. Thoughts? Umm... I don't think that we need to do it on per-page basis. We need some exclusion, all right, but I'ld rather provide it from block_write_full_page() and its ilk. Hell knows... Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Linus Torvalds wrote: > > On Fri, 8 Dec 2000, Daniel Phillips wrote: > > > > [ flush-buffers taking the page lock ] > > > > This is great when you have buffersize==pagesize. When there are > > multiple buffers per page it means that some of the buffers might have > > to wait for flushing just because bdflush started IO on some other > > buffer on the same page. Oh well. The common case improves in terms > > being proveably correct and the uncommon case gets worse a tiny bit. > > It sounds like a win. > > Also, I think that we should strive for a setup where most of the dirty > buffer flushing is done through "page_launder()" instead of using > sync_buffers all that much at all. > > I'm convinced that the page LRU list is as least as good as, if not better > than, the dirty buffer timestamp stuff. And as we need to have the page > LRU for other reasons anyway, I'd like the long-range plan to be to get > rid of the buffer LRU completely. It wastes memory and increases > complexity for very little gain, I think. > I think flushing pages instead of buffers is a good direction to take. Two things: 1. currently bdflush is setup to use page_launder only under memory pressure (if (free_shortage() ... ) Do you think that it should call page_launder regardless? 2. There are two operations here: a. starting a write-back, periodically. b. freeing a page, which may involve taking the page out of a inode mapping, etc. IOW, what page_launder does. bdflush primarily does (a). If we want to move to page-oriented flushing, we atleast need extra information in the _page_ structure as to whether it is time to flush the page back. -- Rajagopal Ananthanarayanan ("ananth") Member Technical Staff, SGI. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Daniel Phillips wrote: > > [ flush-buffers taking the page lock ] > > This is great when you have buffersize==pagesize. When there are > multiple buffers per page it means that some of the buffers might have > to wait for flushing just because bdflush started IO on some other > buffer on the same page. Oh well. The common case improves in terms > being proveably correct and the uncommon case gets worse a tiny bit. > It sounds like a win. Also, I think that we should strive for a setup where most of the dirty buffer flushing is done through "page_launder()" instead of using sync_buffers all that much at all. I'm convinced that the page LRU list is as least as good as, if not better than, the dirty buffer timestamp stuff. And as we need to have the page LRU for other reasons anyway, I'd like the long-range plan to be to get rid of the buffer LRU completely. It wastes memory and increases complexity for very little gain, I think. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 08 Dec 2000, Linus Torvalds wrote: > Btw, I also think that the dirty buffer flushing should get the page lock. > Right now it touches the buffer list without holding the lock on the page > that the buffer is on, which means that there is really nothign that > prevents it from racing with the page-based writeout. I don't like that: > it does hold the LRU list lock, so the list state itself is ok, but it > does actually touch part of the buffer state that is not really protected > by the lock. > > I think it ends up being ok because ll_rw_block will ignore buffers that > get output twice, and the rest of the state is handled with atomic > operations (b_count and b_flags), but it's not really proper. If > flush_dirty_buffers() got the page lock, everything would be protected > properly. Thoughts? This is great when you have buffersize==pagesize. When there are multiple buffers per page it means that some of the buffers might have to wait for flushing just because bdflush started IO on some other buffer on the same page. Oh well. The common case improves in terms being proveably correct and the uncommon case gets worse a tiny bit. It sounds like a win. We are moving steadily away from buffer oriented I/O anyway, and I think we can optimize the case of buffersize!=pagesize in 2.5 by being a little more careful about how getblk hands out buffers. Getblk could force all buffers on the same page to be on the same major/minor, or even better, to be physically close together. Or more simply, flush_dirty_buffers could check for other dirty buffers on the same page and initiate I/O at the same time. Or both strategies. This is by way of buttressing an argument in favor of simplicity and reliabilty, i.e., being religious about taking the page lock, even when there's a slight cost in the short term. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: > > Erm... So you want to make ->commit_write() page-unlocking? Fine with me, > but that will make for somewhat bigger patch. Hey, _you_ are in position > to change the locking rules, freeze or not, so if it's OK with you... No. Read the code a bit more. commit_write() doesn't start any IO at all. It only marks the buffer dirty. The dirty flush-out works the way it has always worked. (You're right in that the dirty flusher should make sure to change b_end_io when they do their write-outs - that code used to just depend on the buffer always having b_end_io magically set) > Hrm. Why not move setting ->b_end_io to ll_rw_block() while we are at it? > Or into ll_rw_block() wrapper... That's too big a change right now, but yes, in theory. That would make sure that nobody could ever forget to set the "completion" handler. In the meantime, let's just enforce it for ll_rw_block: make sure that there is a 1:1 mapping between setting b_end_io and doing a ll_rw_block (and let's make sure that there are no more of these non-local rules any more). Btw, I also think that the dirty buffer flushing should get the page lock. Right now it touches the buffer list without holding the lock on the page that the buffer is on, which means that there is really nothign that prevents it from racing with the page-based writeout. I don't like that: it does hold the LRU list lock, so the list state itself is ok, but it does actually touch part of the buffer state that is not really protected by the lock. I think it ends up being ok because ll_rw_block will ignore buffers that get output twice, and the rest of the state is handled with atomic operations (b_count and b_flags), but it's not really proper. If flush_dirty_buffers() got the page lock, everything would be protected properly. Thoughts? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: > > Fix: postpone changing ->b_end_io until the call of ll_rw_block(); if by > the time of ll_rw_block() some fragments will still have IO in progress - > wait on them. > > Comments? Yes. On the other hand, I have this suspicion that there is an even simpler solution: stop using the end_buffer_io_sync version for writes altogether. It's really only __block_prepare_write() that can mark the buffers for sync writes, and even that case is fairly bogus: it really only wants to mark the non-upto-date buffers that it wants to _read_ for sync IO, it just "knows" that it is ok to change every buffer it sees. It isn't. Moving the bh->b_end_io = end_buffer_io_sync; into the read-path should be sufficient (hell, we should move the "ll_rw_block(READ, 1, )" away, and make it one single read with ll_rw_block(READ, wait_bh-wait, wait); at the end of the loop. If we do it that way, there is no way the two can clash, because a non-up-to-date buffer head won't ever be touched by the write path, so we can't get this kind of confusion. Al? I'd really prefer this alternative instead. Do you see any problems with it? (New rule that makes 100% sense: NOBODY sets "bh->b_end_io" on any buffer that he isn't actually going to do IO on. And if there is pending IO on the buffer, it had better only be of the same type as the one you're going to do). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Folks, see if the following patch helps. AFAICS it closes a pretty real race - we could call block_write_full_page() for a page that has sync IO in progress and blindly change ->b_end_io callbacks on the bh with pending requests. With a little bit of bad luck they would complete before we got to ll_rw_block(), thus leading to extra UnlockPage(). All it takes is a pageout on a page that got partial write recently - if some fragments were still unmapped we get to call get_block() on them and it can easily block, providing a decent window for that race. Fix: postpone changing ->b_end_io until the call of ll_rw_block(); if by the time of ll_rw_block() some fragments will still have IO in progress - wait on them. Comments? Cheers, Al --- buffer.cFri Dec 8 16:19:53 2000 +++ buffer.c.newFri Dec 8 16:26:44 2000 @@ -1577,6 +1577,26 @@ * "Dirty" is valid only with the last case (mapped+uptodate). */ +static void write_array_async(struct page *page, struct buffer_head **p, int n) +{ + int i; + if (!n) { + UnlockPage(page); + return; + } + /* +* If there are pending requests on these guys - wait before changing +* ->b_end_io. +*/ + for (i=0; ib_state); + set_bit(BH_Dirty, [i]->b_state); + p[i]->b_end_io = end_buffer_io_async; + } + ll_rw_block(WRITE, n, p); +} + /* * block_write_full_page() is SMP-safe - currently it's still * being called with the kernel lock held, but the code is ready. @@ -1616,28 +1636,17 @@ if (buffer_new(bh)) unmap_underlying_metadata(bh); } - set_bit(BH_Uptodate, >b_state); - set_bit(BH_Dirty, >b_state); - bh->b_end_io = end_buffer_io_async; atomic_inc(>b_count); arr[nr++] = bh; bh = bh->b_this_page; block++; } while (bh != head); - if (nr) { - ll_rw_block(WRITE, nr, arr); - } else { - UnlockPage(page); - } + write_array_async(page, arr, nr); SetPageUptodate(page); return 0; out: - if (nr) { - ll_rw_block(WRITE, nr, arr); - } else { - UnlockPage(page); - } + write_array_async(page, arr, nr); ClearPageUptodate(page); return err; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[found?] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
I'm doing some massive grepping (basically, audit of page locking), but nothing relevant so far. There was some catch (aside of documenting the thing and finding several completely unrelated buglets): * ramfs_writepage() doesn't UnlockPage(). Deadlock. * udf_adinicb_writepage() does extra UnlockPage(). I don't see the fsckup in loop.c, though. On the read path it uses do_generic_read_file() and on the write it's essentially the simplified variant of generic_file_write(). Hell knows... It looks like we are getting dirty buffer inheriting end_buffer_io_async from the previous life. Oh, damn it. Indeed. Look: generic_file_write() or lo_send(): lock_page() ->prepare_write() sets sync ->b_end_io ->commit_write()puts them on the dirty list UnlockPage()releases the page lock ... requests are sent to driver page_launder(): TryLockPage() succeeds block_write_full_page() ... goes through the bh'es and sets ->b_end_io to end_buffer_io_async() at that point the last remaining request completes. It calls ->b_end_io(), aka. end_buffer_io_async(). And does UnlockPage(). In the meanwhile, we call ll_rw_block(). Requests are sent again. When _they_ complete we get the second UnlockPage() Now, I might miss some obvious reason why it could never happen. Moreover, the real problem may be completely different - the race above is not too wide. However, I'ld really like to hear why the scenario above is impossible. BTW, the race isn't even that narrow - if ->prepare_write() didn't cover the whole page we've got a get_block() to call and there's a plenty of time when shit can hit the fan - it's a blocking operation, after all. Comments? Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
[EMAIL PROTECTED] said: > Can you test some more? In particular, I'd love to hear if this > happens with vfat even without loopback, or with loopback even without > vfat (make an ext2 filesystem or similar instead). That woul dnarrow > down the bug further. Loopback-mounted iso9660 does it too. This was #3 on my list of test12-pre5 oospen to investigate this weekend :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
[EMAIL PROTECTED] said: Can you test some more? In particular, I'd love to hear if this happens with vfat even without loopback, or with loopback even without vfat (make an ext2 filesystem or similar instead). That woul dnarrow down the bug further. Loopback-mounted iso9660 does it too. This was #3 on my list of test12-pre5 oospen to investigate this weekend :) -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[found?] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
I'm doing some massive grepping (basically, audit of page locking), but nothing relevant so far. There was some catch (aside of documenting the thing and finding several completely unrelated buglets): * ramfs_writepage() doesn't UnlockPage(). Deadlock. * udf_adinicb_writepage() does extra UnlockPage(). I don't see the fsckup in loop.c, though. On the read path it uses do_generic_read_file() and on the write it's essentially the simplified variant of generic_file_write(). Hell knows... It looks like we are getting dirty buffer inheriting end_buffer_io_async from the previous life. Oh, damn it. Indeed. Look: generic_file_write() or lo_send(): lock_page() -prepare_write() sets sync -b_end_io -commit_write()puts them on the dirty list UnlockPage()releases the page lock ... requests are sent to driver page_launder(): TryLockPage() succeeds block_write_full_page() ... goes through the bh'es and sets -b_end_io to end_buffer_io_async() at that point the last remaining request completes. It calls -b_end_io(), aka. end_buffer_io_async(). And does UnlockPage(). In the meanwhile, we call ll_rw_block(). Requests are sent again. When _they_ complete we get the second UnlockPage() Now, I might miss some obvious reason why it could never happen. Moreover, the real problem may be completely different - the race above is not too wide. However, I'ld really like to hear why the scenario above is impossible. BTW, the race isn't even that narrow - if -prepare_write() didn't cover the whole page we've got a get_block() to call and there's a plenty of time when shit can hit the fan - it's a blocking operation, after all. Comments? Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Folks, see if the following patch helps. AFAICS it closes a pretty real race - we could call block_write_full_page() for a page that has sync IO in progress and blindly change -b_end_io callbacks on the bh with pending requests. With a little bit of bad luck they would complete before we got to ll_rw_block(), thus leading to extra UnlockPage(). All it takes is a pageout on a page that got partial write recently - if some fragments were still unmapped we get to call get_block() on them and it can easily block, providing a decent window for that race. Fix: postpone changing -b_end_io until the call of ll_rw_block(); if by the time of ll_rw_block() some fragments will still have IO in progress - wait on them. Comments? Cheers, Al --- buffer.cFri Dec 8 16:19:53 2000 +++ buffer.c.newFri Dec 8 16:26:44 2000 @@ -1577,6 +1577,26 @@ * "Dirty" is valid only with the last case (mapped+uptodate). */ +static void write_array_async(struct page *page, struct buffer_head **p, int n) +{ + int i; + if (!n) { + UnlockPage(page); + return; + } + /* +* If there are pending requests on these guys - wait before changing +* -b_end_io. +*/ + for (i=0; in; i++) { + wait_on_buffer(p[i]); + set_bit(BH_Uptodate, p[i]-b_state); + set_bit(BH_Dirty, p[i]-b_state); + p[i]-b_end_io = end_buffer_io_async; + } + ll_rw_block(WRITE, n, p); +} + /* * block_write_full_page() is SMP-safe - currently it's still * being called with the kernel lock held, but the code is ready. @@ -1616,28 +1636,17 @@ if (buffer_new(bh)) unmap_underlying_metadata(bh); } - set_bit(BH_Uptodate, bh-b_state); - set_bit(BH_Dirty, bh-b_state); - bh-b_end_io = end_buffer_io_async; atomic_inc(bh-b_count); arr[nr++] = bh; bh = bh-b_this_page; block++; } while (bh != head); - if (nr) { - ll_rw_block(WRITE, nr, arr); - } else { - UnlockPage(page); - } + write_array_async(page, arr, nr); SetPageUptodate(page); return 0; out: - if (nr) { - ll_rw_block(WRITE, nr, arr); - } else { - UnlockPage(page); - } + write_array_async(page, arr, nr); ClearPageUptodate(page); return err; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: Fix: postpone changing -b_end_io until the call of ll_rw_block(); if by the time of ll_rw_block() some fragments will still have IO in progress - wait on them. Comments? Yes. On the other hand, I have this suspicion that there is an even simpler solution: stop using the end_buffer_io_sync version for writes altogether. It's really only __block_prepare_write() that can mark the buffers for sync writes, and even that case is fairly bogus: it really only wants to mark the non-upto-date buffers that it wants to _read_ for sync IO, it just "knows" that it is ok to change every buffer it sees. It isn't. Moving the bh-b_end_io = end_buffer_io_sync; into the read-path should be sufficient (hell, we should move the "ll_rw_block(READ, 1, bh)" away, and make it one single read with ll_rw_block(READ, wait_bh-wait, wait); at the end of the loop. If we do it that way, there is no way the two can clash, because a non-up-to-date buffer head won't ever be touched by the write path, so we can't get this kind of confusion. Al? I'd really prefer this alternative instead. Do you see any problems with it? (New rule that makes 100% sense: NOBODY sets "bh-b_end_io" on any buffer that he isn't actually going to do IO on. And if there is pending IO on the buffer, it had better only be of the same type as the one you're going to do). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: Erm... So you want to make -commit_write() page-unlocking? Fine with me, but that will make for somewhat bigger patch. Hey, _you_ are in position to change the locking rules, freeze or not, so if it's OK with you... No. Read the code a bit more. commit_write() doesn't start any IO at all. It only marks the buffer dirty. The dirty flush-out works the way it has always worked. (You're right in that the dirty flusher should make sure to change b_end_io when they do their write-outs - that code used to just depend on the buffer always having b_end_io magically set) Hrm. Why not move setting -b_end_io to ll_rw_block() while we are at it? Or into ll_rw_block() wrapper... That's too big a change right now, but yes, in theory. That would make sure that nobody could ever forget to set the "completion" handler. In the meantime, let's just enforce it for ll_rw_block: make sure that there is a 1:1 mapping between setting b_end_io and doing a ll_rw_block (and let's make sure that there are no more of these non-local rules any more). Btw, I also think that the dirty buffer flushing should get the page lock. Right now it touches the buffer list without holding the lock on the page that the buffer is on, which means that there is really nothign that prevents it from racing with the page-based writeout. I don't like that: it does hold the LRU list lock, so the list state itself is ok, but it does actually touch part of the buffer state that is not really protected by the lock. I think it ends up being ok because ll_rw_block will ignore buffers that get output twice, and the rest of the state is handled with atomic operations (b_count and b_flags), but it's not really proper. If flush_dirty_buffers() got the page lock, everything would be protected properly. Thoughts? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 08 Dec 2000, Linus Torvalds wrote: Btw, I also think that the dirty buffer flushing should get the page lock. Right now it touches the buffer list without holding the lock on the page that the buffer is on, which means that there is really nothign that prevents it from racing with the page-based writeout. I don't like that: it does hold the LRU list lock, so the list state itself is ok, but it does actually touch part of the buffer state that is not really protected by the lock. I think it ends up being ok because ll_rw_block will ignore buffers that get output twice, and the rest of the state is handled with atomic operations (b_count and b_flags), but it's not really proper. If flush_dirty_buffers() got the page lock, everything would be protected properly. Thoughts? This is great when you have buffersize==pagesize. When there are multiple buffers per page it means that some of the buffers might have to wait for flushing just because bdflush started IO on some other buffer on the same page. Oh well. The common case improves in terms being proveably correct and the uncommon case gets worse a tiny bit. It sounds like a win. We are moving steadily away from buffer oriented I/O anyway, and I think we can optimize the case of buffersize!=pagesize in 2.5 by being a little more careful about how getblk hands out buffers. Getblk could force all buffers on the same page to be on the same major/minor, or even better, to be physically close together. Or more simply, flush_dirty_buffers could check for other dirty buffers on the same page and initiate I/O at the same time. Or both strategies. This is by way of buttressing an argument in favor of simplicity and reliabilty, i.e., being religious about taking the page lock, even when there's a slight cost in the short term. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Daniel Phillips wrote: [ flush-buffers taking the page lock ] This is great when you have buffersize==pagesize. When there are multiple buffers per page it means that some of the buffers might have to wait for flushing just because bdflush started IO on some other buffer on the same page. Oh well. The common case improves in terms being proveably correct and the uncommon case gets worse a tiny bit. It sounds like a win. Also, I think that we should strive for a setup where most of the dirty buffer flushing is done through "page_launder()" instead of using sync_buffers all that much at all. I'm convinced that the page LRU list is as least as good as, if not better than, the dirty buffer timestamp stuff. And as we need to have the page LRU for other reasons anyway, I'd like the long-range plan to be to get rid of the buffer LRU completely. It wastes memory and increases complexity for very little gain, I think. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Linus Torvalds wrote: On Fri, 8 Dec 2000, Alexander Viro wrote: Erm... So you want to make -commit_write() page-unlocking? Fine with me, but that will make for somewhat bigger patch. Hey, _you_ are in position to change the locking rules, freeze or not, so if it's OK with you... No. Read the code a bit more. commit_write() doesn't start any IO at all. It only marks the buffer dirty. I'm quite aware of that fact ;-) However, you said On the other hand, I have this suspicion that there is an even simpler solution: stop using the end_buffer_io_sync version for writes altogether. If that happens (i.e. if write requests resulting from prepare_write()/ commit_write()/bdflush sequence become async) we must stop unlocking pages after commit_write(). Essentially it would become unlocker of the same kind as readpage() and writepage() - callers must assume that page submitted to commit_write() will eventually be unlocked. The dirty flush-out works the way it has always worked. (You're right in that the dirty flusher should make sure to change b_end_io when they do their write-outs - that code used to just depend on the buffer always having b_end_io magically set) Hmm... IDGI. Do you want the request resulting from commit_write() ("resulting" != "issued") to stay sync (in that case we still need to change block_write_full_page() since the analysis stands - it dirties blocks unconditionally) or you want them to go async (in that case we need to change commit_write() callers)? Could you clarify? Btw, I also think that the dirty buffer flushing should get the page lock. Right now it touches the buffer list without holding the lock on the page that the buffer is on, which means that there is really nothign that prevents it from racing with the page-based writeout. I don't like that: it does hold the LRU list lock, so the list state itself is ok, but it does actually touch part of the buffer state that is not really protected by the lock. I think it ends up being ok because ll_rw_block will ignore buffers that get output twice, and the rest of the state is handled with atomic operations (b_count and b_flags), but it's not really proper. If flush_dirty_buffers() got the page lock, everything would be protected properly. Thoughts? Umm... I don't think that we need to do it on per-page basis. We need some exclusion, all right, but I'ld rather provide it from block_write_full_page() and its ilk. Hell knows... Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Alexander Viro wrote: I'm quite aware of that fact ;-) However, you said On the other hand, I have this suspicion that there is an even simpler solution: stop using the end_buffer_io_sync version for writes altogether. If that happens (i.e. if write requests resulting from prepare_write()/ commit_write()/bdflush sequence become async) we must stop unlocking pages after commit_write(). Essentially it would become unlocker of the same kind as readpage() and writepage() - callers must assume that page submitted to commit_write() will eventually be unlocked. You're right, we can't do that for anonymous buffers right now. Mea culpa. Looking more at this issue, I suspect that the easiest pretty solution that everybody can probably agree is reasonable is to either pass down the end-of-io callback to ll_rw_block as you suggested, or, preferably by just forcing the _caller_ to do the buffer locking, and just do the b_end_io stuff inside the buffer lock and get rid of all the races that way instead (and make ll_rw_block() verify that the buffers it is passed are always locked). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Fri, 8 Dec 2000, Linus Torvalds wrote: Looking more at this issue, I suspect that the easiest pretty solution that everybody can probably agree is reasonable is to either pass down the end-of-io callback to ll_rw_block as you suggested, or, preferably by just forcing the _caller_ to do the buffer locking, and just do the b_end_io stuff inside the buffer lock and get rid of all the races that way instead (and make ll_rw_block() verify that the buffers it is passed are always locked). Hmm... I've looked through the ll_rw_block() callers and yes, it seems to be doable. BTW, we probably want a helper function that would do lock+ll_rw_block+wait - it will simplify the life in filesystems. I'll do a patch tonight. Cheers, Al PS: alpha-testers needed for sysvfs patches - right now the thing is racey as hell and unlike minixfs I can't test it myself. It's more or less straightforward port of minixfs patch that went into the tree sometime ago, so it shouldn't be too bad, but... IOW, if somebody has boxen to test the thing on - yell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Linus Torvalds wrote: > > It's not a new bug - it's an old bug that apparently is uncovered by a > new stricter test. > > Apparently loopback unlocks an already unlocked page - which has always > been a serious offense, but has never been detected before. > > test12-pre6+ detects it, and thus the BUG(). > > Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so > it's impossible to debug, but it's already interesting information to > see that it seems to be either loopback of vfat. > > Can you test some more? In particular, I'd love to hear if this happens > with vfat even without loopback, or with loopback even without vfat > (make an ext2 filesystem or similar instead). That woul dnarrow down the > bug further. > > Thanks, > Linus Hi, Here is a rather different datapoint. I hope it's different enough to help nail this. test12-pre5 + kdb + Serial Console. Sorry, I didn't get the contents of pointer args. It's probably worth saying that this kernel was compiled with gcc 2.95.2. I have the blessed egcs also, will compile pre7 with that and see what happens. Cheers, Tom from nfs mounted ext2 (2.4.0-test12-pre5): kernel BUG at buffer.c:827! Entering kdb (current=0xc2014000, pid 466) Panic: invalid operand due to panic @ 0xc0130c73 eax = 0x001c ebx = 0xc109ebf8 ecx = 0x edx = 0x0006 esi = 0xc2739af0 edi = 0xc2739b38 esp = 0xc2015dd4 eip = 0xc0130c73 ebp = 0xc2015df4 xss = 0x0018 xcs = 0xc2010010 eflags = 0x00010046 xds = 0x0018 xes = 0x0018 origeax = 0x = 0xc2015da0 kdb> bt EBP EIP Function(args) 0xc2015df4 0xc0130c73 end_buffer_io_async+0xd3 (0xc2739af0, 0x1) kernel .text 0xc010 0xc0130ba0 0xc0130cb0 0xc2015e10 0xc0164756 end_that_request_first+0x66 (0xc11c2c20, 0x1, 0xc031cf04) kernel .text 0xc010 0xc01646f0 0xc01647b0 0xc2015e30 0xc018a128 ide_end_request+0x28 (0x1, 0xc11c5060) kernel .text 0xc010 0xc018a100 0xc018a180 0xc2015e64 0xc018e614 read_intr+0x104 (0xc031ce20) kernel .text 0xc010 0xc018e510 0xc018e650 0xc2015e88 0xc018bad6 ide_intr+0x106 (0xe, 0xc11c5060, 0xc2015ed4, 0x1c0) kernel .text 0xc010 0xc018b9d0 0xc018bb30 0xc2015ea8 0xc010ab50 handle_IRQ_event+0x30 (0xe, 0xc2015ed4, 0xc11de2e0) kernel .text 0xc010 0xc010ab20 0xc010ab80 0xc2015ecc 0xc010ace2 do_IRQ+0x72 (0xc02f4520, 0xc02a92ac, 0xc201c000, 0xc201c000, 0xfc18) kernel .text 0xc010 0xc010ac70 0xc010ad30 0xc01093f0 ret_from_intr kernel .text 0xc010 0xc01093f0 0xc0109410 Interrupt registers: eax = 0x0019 ebx = 0xc02f4520 ecx = 0xc02a92ac edx = 0xc201c000 esi = 0xc201c000 edi = 0xfc18 esp = 0xc2015f08 eip = 0xc0115816 ebp = 0xc2015f4c xss = 0x0018 xcs = 0xc010 eflags = 0x0287 xds = 0xc2070018 xes = 0xc2070018 origeax = 0xff0e = 0xc2015ed4 0xc0115816 schedule+0x1b6 kernel .text 0xc010 0xc0115660 0xc0115b00 0xc2015f70 0xc01155c7 schedule_timeout+0x17 (0xc2014000, 0x1785222) kernel .text 0xc010 0xc01155b0 0xc0115650 0xc2015fac 0xc4055753 [sunrpc]svc_recv+0x1a3 (0xc24b2470, 0xc207be00, 0x7fff) sunrpc .text 0xc404e060 0xc40555b0 0xc4055940 0xc2015fec 0xc40713f3 [nfsd]nfsd+0x253 nfsd .text 0xc4071060 0xc40711a0 0xc40714a0 0xc0107843 kernel_thread+0x23 kernel .text 0xc010 0xc0107820 0xc0107860 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Thu, 07 Dec 2000 17:23:51 -0800, Joseph Cheek <[EMAIL PROTECTED]> wrote: >i'll check it out. i'm compiling ksymoops now, is there a way to get it to >work without a static libbfd? all i've got is a libbfd.so, and i'm going to >need to recompile binutils if i must have a libbfd.a. ksymoops/Makefile, change $(CC) $(OBJECTS) $(CFLAGS) -Wl,-Bstatic -lbfd -liberty -Wl,-Bdynamic -o $@ to $(CC) $(OBJECTS) $(CFLAGS) -lbfd -liberty -o $@ But you should have libbfd.a. Debian splits binutils into binutils and binutils-dev, libbfd.a might be in the latter. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
hi, comments below. Linus Torvalds wrote: > In article <[EMAIL PROTECTED]>, > Joseph Cheek <[EMAIL PROTECTED]> wrote: > >copying files off a loopback-mounted vfat filesystem exposes this bug. > >test11 worked fine. > > It's not a new bug - it's an old bug that apparently is uncovered by a > new stricter test. > > Apparently loopback unlocks an already unlocked page - which has always > been a serious offense, but has never been detected before. > > test12-pre6+ detects it, and thus the BUG(). > > Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so > it's impossible to debug, but it's already interesting information to > see that it seems to be either loopback of vfat. > > Can you test some more? In particular, I'd love to hear if this happens > with vfat even without loopback, or with loopback even without vfat > (make an ext2 filesystem or similar instead). That woul dnarrow down the > bug further. this happens on loopbacked ext2, and not on regular vfat. so it appears that the culprit is loopback. i got ksymoops working, here are the traces from the vfat-over-loopback [first] and the ext2-over-loopback [second]. again, these are copied by hand, so i give it a 1% chance of mistranscription. ksymoops 2.3.5 on i686 2.4.0. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. kernel BUG at buffer.c:827! invalid operand: CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010082 eax: 001c ebx: c1d8fc60 ecx: edx: 0001 esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0 ds: 0018 es: 0018 ss: 0018 Process cp (pid 762, stackpage=c1793000) Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001 c01610e1 c1d8fc60 0001 c1cef420 c1cef420 c02c8ed8 c88df91c c1cef420 0001 c88e0986 0007 0001 c02c8ed8 c02c8ee8 c4f18800 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [c0128720>] [] [] [c010b56b>] Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00 >>EIP; c013660c<= Trace; c01fe484 Trace; c01fe95a Trace; c0130703 <__alloc_pages+df/2d4> Trace; c8895de3 <[fat]fat_readpage+f/14> Trace; c88df91c <[cdrom]cdrom_read_mech_status+c/4c> Trace; c8894494 <[fat]fat_get_block+0/e4> Trace; c0160d2f <__make_request+5cb/648> Trace; c0160ead Trace; c0161011 Trace; c0137a49 Trace; c0130703 <__alloc_pages+df/2d4> Trace; c8895de3 <[fat]fat_readpage+f/14> Trace; c8894494 <[fat]fat_get_block+0/e4> Trace; c01284d3 Trace; c012887b Trace; c889448d <[fat]fat_file_read+2d/34> Trace; c01349a7 Code; c013660c <_EIP>: Code; c013660c<= 0: 0f 0b ud2a <= Code; c013660e 2: 83 c4 0c add$0xc,%esp Code; c0136611 5: 8d 5e 28 lea0x28(%esi),%ebx Code; c0136614 8: 8d 46 2c lea0x2c(%esi),%eax Code; c0136617 b: 39 46 2c cmp%eax,0x2c(%esi) Code; c013661a e: 74 24 je 34 <_EIP+0x34> c0136640 Code; c013661c 10: b9 01 00 00 00mov$0x1,%ecx 1 warning issued. Results may not be reliable. ksymoops 2.3.5 on i686 2.4.0. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. kernel BUG at buffer.c:827! invalid operand: CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010086 eax: 001c ebx: c1d212a0 ecx: edx: 0001 esi: c11274bc edi: 0002 ebp: c1d212e8 esp: c4f1bddc ds: 0018 es: 0018 ss: 0018 Process cp (pid 772, stackpage=c4f1b000) Stack: c01fe484 c01fe95a 033b c1d21290 c1983420 0002 0001 c01610e1 c1d212a0 0001 c1983420 c1983420 c02c8ed8 c88df91c c1983420 0001 c88e0986 0007 0002 c02c8ed8 c02c8ee8 c4cdf998 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [c0128720>] []
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
i'll check it out. i'm compiling ksymoops now, is there a way to get it to work without a static libbfd? all i've got is a libbfd.so, and i'm going to need to recompile binutils if i must have a libbfd.a. Linus Torvalds wrote: > Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so > it's impossible to debug, but it's already interesting information to > see that it seems to be either loopback of vfat. > > Can you test some more? In particular, I'd love to hear if this happens > with vfat even without loopback, or with loopback even without vfat > (make an ext2 filesystem or similar instead). That woul dnarrow down the > bug further. thanks! joe -- Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/ Linuxcare. Support for the Revolution. | [EMAIL PROTECTED] CTO / Acting PM, Redmond Linux Project | [EMAIL PROTECTED] 425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
In article <[EMAIL PROTECTED]>, Joseph Cheek <[EMAIL PROTECTED]> wrote: >copying files off a loopback-mounted vfat filesystem exposes this bug. >test11 worked fine. It's not a new bug - it's an old bug that apparently is uncovered by a new stricter test. Apparently loopback unlocks an already unlocked page - which has always been a serious offense, but has never been detected before. test12-pre6+ detects it, and thus the BUG(). Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so it's impossible to debug, but it's already interesting information to see that it seems to be either loopback of vfat. Can you test some more? In particular, I'd love to hear if this happens with vfat even without loopback, or with loopback even without vfat (make an ext2 filesystem or similar instead). That woul dnarrow down the bug further. Thanks, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Thu, 07 Dec 2000 14:42:38 -0800, Joseph Cheek <[EMAIL PROTECTED]> wrote: >loop.o built as module. this hard crashes the machine, every time >[PIII-450]. i don't know how to debug this, is there a FAQ? linux/Documentation/oops-tracing.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
kernel BUG at buffer.c:827 in test12-pre6 and 7
copying files off a loopback-mounted vfat filesystem exposes this bug. test11 worked fine. loop.o built as module. this hard crashes the machine, every time [PIII-450]. i don't know how to debug this, is there a FAQ? [transcribed by hand]: # mount -o loop /tmp/cdboot.288 /mnt/cd # cd /mnt/cd # cp menu.lst /tmp kernel BUG at buffer.c:827! invalid operand: CPU: 0 EIP: 0010:[] EFLAGS: 00010082 eax: 001c ebx: c1d8fc60 ecx: edx: 0001 esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0 ds: 0018 es: 0018 ss: 0018 Process cp (pid 762, stackpage=c1793000) Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001 c01610e1 c1d8fc60 0001 c1cef420 c1cef420 c02c8ed8 c88df91c c1cef420 0001 c88e0986 0007 0001 c02c8ed8 c02c8ee8 c4f18800 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [c0128720>] [] [] [c010b56b>] Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00 as soon as i reboot i will look what's at buffer.c:827 -- thanks! joe -- Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/ Linuxcare. Support for the Revolution. | [EMAIL PROTECTED] CTO / Acting PM, Redmond Linux Project | [EMAIL PROTECTED] 425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
kernel BUG at buffer.c:827 in test12-pre6 and 7
copying files off a loopback-mounted vfat filesystem exposes this bug. test11 worked fine. loop.o built as module. this hard crashes the machine, every time [PIII-450]. i don't know how to debug this, is there a FAQ? [transcribed by hand]: # mount -o loop /tmp/cdboot.288 /mnt/cd # cd /mnt/cd # cp menu.lst /tmp kernel BUG at buffer.c:827! invalid operand: CPU: 0 EIP: 0010:[c013660c] EFLAGS: 00010082 eax: 001c ebx: c1d8fc60 ecx: edx: 0001 esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0 ds: 0018 es: 0018 ss: 0018 Process cp (pid 762, stackpage=c1793000) Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001 c01610e1 c1d8fc60 0001 c1cef420 c1cef420 c02c8ed8 c88df91c c1cef420 0001 c88e0986 0007 0001 c02c8ed8 c02c8ee8 c4f18800 Call Trace: [c01fe484] [c01fe95a] [c0130703] [c8895de3] [c88df91c] [c8894494] [c0160d2f] [c0160ead] [c0161011] [c0137a49] [c0130703] [c8895de3] [c8894494] [c01284d3] [c012887b] [c0128720] [c889448d] [c01349a7] [c010b56b] Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00 as soon as i reboot i will look what's at buffer.c:827 -- thanks! joe -- Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/ Linuxcare. Support for the Revolution. | [EMAIL PROTECTED] CTO / Acting PM, Redmond Linux Project | [EMAIL PROTECTED] 425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Thu, 07 Dec 2000 14:42:38 -0800, Joseph Cheek [EMAIL PROTECTED] wrote: loop.o built as module. this hard crashes the machine, every time [PIII-450]. i don't know how to debug this, is there a FAQ? linux/Documentation/oops-tracing.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
In article [EMAIL PROTECTED], Joseph Cheek [EMAIL PROTECTED] wrote: copying files off a loopback-mounted vfat filesystem exposes this bug. test11 worked fine. It's not a new bug - it's an old bug that apparently is uncovered by a new stricter test. Apparently loopback unlocks an already unlocked page - which has always been a serious offense, but has never been detected before. test12-pre6+ detects it, and thus the BUG(). Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so it's impossible to debug, but it's already interesting information to see that it seems to be either loopback of vfat. Can you test some more? In particular, I'd love to hear if this happens with vfat even without loopback, or with loopback even without vfat (make an ext2 filesystem or similar instead). That woul dnarrow down the bug further. Thanks, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
i'll check it out. i'm compiling ksymoops now, is there a way to get it to work without a static libbfd? all i've got is a libbfd.so, and i'm going to need to recompile binutils if i must have a libbfd.a. Linus Torvalds wrote: Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so it's impossible to debug, but it's already interesting information to see that it seems to be either loopback of vfat. Can you test some more? In particular, I'd love to hear if this happens with vfat even without loopback, or with loopback even without vfat (make an ext2 filesystem or similar instead). That woul dnarrow down the bug further. thanks! joe -- Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/ Linuxcare. Support for the Revolution. | [EMAIL PROTECTED] CTO / Acting PM, Redmond Linux Project | [EMAIL PROTECTED] 425 990-1072 vox [1074 fax] 206 679-6838 pcs | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
hi, comments below. Linus Torvalds wrote: In article [EMAIL PROTECTED], Joseph Cheek [EMAIL PROTECTED] wrote: copying files off a loopback-mounted vfat filesystem exposes this bug. test11 worked fine. It's not a new bug - it's an old bug that apparently is uncovered by a new stricter test. Apparently loopback unlocks an already unlocked page - which has always been a serious offense, but has never been detected before. test12-pre6+ detects it, and thus the BUG(). Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so it's impossible to debug, but it's already interesting information to see that it seems to be either loopback of vfat. Can you test some more? In particular, I'd love to hear if this happens with vfat even without loopback, or with loopback even without vfat (make an ext2 filesystem or similar instead). That woul dnarrow down the bug further. this happens on loopbacked ext2, and not on regular vfat. so it appears that the culprit is loopback. i got ksymoops working, here are the traces from the vfat-over-loopback [first] and the ext2-over-loopback [second]. again, these are copied by hand, so i give it a 1% chance of mistranscription. ksymoops 2.3.5 on i686 2.4.0. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. kernel BUG at buffer.c:827! invalid operand: CPU: 0 EIP: 0010:[c013660c] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010082 eax: 001c ebx: c1d8fc60 ecx: edx: 0001 esi: c10658e4 edi: 0002 ebp: c1d8fca8 esp: c1793dc0 ds: 0018 es: 0018 ss: 0018 Process cp (pid 762, stackpage=c1793000) Stack: c01fe484 c01fe95a 033b c1d8fc60 c1cef420 0001 0001 c01610e1 c1d8fc60 0001 c1cef420 c1cef420 c02c8ed8 c88df91c c1cef420 0001 c88e0986 0007 0001 c02c8ed8 c02c8ee8 c4f18800 Call Trace: [c01fe484] [c01fe95a] [c0130703] [c8895de3] [c88df91c] [c8894494] [c0160d2f] [c0160ead] [c0161011] [c0137a49] [c0130703] [c8895de3] [c8894494] [c01284d3] [c012887b] [c0128720] [c889448d] [c01349a7] [c010b56b] Code: 0f 0b 83 c4 0c 8d 5e 28 8d 46 2c 39 46 2c 74 24 b9 01 00 00 EIP; c013660c end_buffer_io_async+e0/11c = Trace; c01fe484 tvecs+4fc0/13968 Trace; c01fe95a tvecs+5496/13968 Trace; c0130703 __alloc_pages+df/2d4 Trace; c8895de3 [fat]fat_readpage+f/14 Trace; c88df91c [cdrom]cdrom_read_mech_status+c/4c Trace; c8894494 [fat]fat_get_block+0/e4 Trace; c0160d2f __make_request+5cb/648 Trace; c0160ead generic_make_request+101/110 Trace; c0161011 ll_rw_block+155/1c4 Trace; c0137a49 block_read_full_page+1fd/2a8 Trace; c0130703 __alloc_pages+df/2d4 Trace; c8895de3 [fat]fat_readpage+f/14 Trace; c8894494 [fat]fat_get_block+0/e4 Trace; c01284d3 do_generic_file_read+337/584 Trace; c012887b generic_file_read+5b/74 Trace; c889448d [fat]fat_file_read+2d/34 Trace; c01349a7 sys_read+8f/c4 Code; c013660c end_buffer_io_async+e0/11c _EIP: Code; c013660c end_buffer_io_async+e0/11c = 0: 0f 0b ud2a = Code; c013660e end_buffer_io_async+e2/11c 2: 83 c4 0c add$0xc,%esp Code; c0136611 end_buffer_io_async+e5/11c 5: 8d 5e 28 lea0x28(%esi),%ebx Code; c0136614 end_buffer_io_async+e8/11c 8: 8d 46 2c lea0x2c(%esi),%eax Code; c0136617 end_buffer_io_async+eb/11c b: 39 46 2c cmp%eax,0x2c(%esi) Code; c013661a end_buffer_io_async+ee/11c e: 74 24 je 34 _EIP+0x34 c0136640 end_buffer_io_async+114/11c Code; c013661c end_buffer_io_async+f0/11c 10: b9 01 00 00 00mov$0x1,%ecx 1 warning issued. Results may not be reliable. ksymoops 2.3.5 on i686 2.4.0. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. kernel BUG at buffer.c:827! invalid operand: CPU: 0 EIP: 0010:[c013660c] Using defaults from ksymoops -t
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
On Thu, 07 Dec 2000 17:23:51 -0800, Joseph Cheek [EMAIL PROTECTED] wrote: i'll check it out. i'm compiling ksymoops now, is there a way to get it to work without a static libbfd? all i've got is a libbfd.so, and i'm going to need to recompile binutils if i must have a libbfd.a. ksymoops/Makefile, change $(CC) $(OBJECTS) $(CFLAGS) -Wl,-Bstatic -lbfd -liberty -Wl,-Bdynamic -o $@ to $(CC) $(OBJECTS) $(CFLAGS) -lbfd -liberty -o $@ But you should have libbfd.a. Debian splits binutils into binutils and binutils-dev, libbfd.a might be in the latter. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at buffer.c:827 in test12-pre6 and 7
Linus Torvalds wrote: It's not a new bug - it's an old bug that apparently is uncovered by a new stricter test. Apparently loopback unlocks an already unlocked page - which has always been a serious offense, but has never been detected before. test12-pre6+ detects it, and thus the BUG(). Your stack trace isn't symbolic (see Documentation/oops-tracing.txt), so it's impossible to debug, but it's already interesting information to see that it seems to be either loopback of vfat. Can you test some more? In particular, I'd love to hear if this happens with vfat even without loopback, or with loopback even without vfat (make an ext2 filesystem or similar instead). That woul dnarrow down the bug further. Thanks, Linus Hi, Here is a rather different datapoint. I hope it's different enough to help nail this. test12-pre5 + kdb + Serial Console. Sorry, I didn't get the contents of pointer args. It's probably worth saying that this kernel was compiled with gcc 2.95.2. I have the blessed egcs also, will compile pre7 with that and see what happens. Cheers, Tom from nfs mounted ext2 (2.4.0-test12-pre5): kernel BUG at buffer.c:827! Entering kdb (current=0xc2014000, pid 466) Panic: invalid operand due to panic @ 0xc0130c73 eax = 0x001c ebx = 0xc109ebf8 ecx = 0x edx = 0x0006 esi = 0xc2739af0 edi = 0xc2739b38 esp = 0xc2015dd4 eip = 0xc0130c73 ebp = 0xc2015df4 xss = 0x0018 xcs = 0xc2010010 eflags = 0x00010046 xds = 0x0018 xes = 0x0018 origeax = 0x regs = 0xc2015da0 kdb bt EBP EIP Function(args) 0xc2015df4 0xc0130c73 end_buffer_io_async+0xd3 (0xc2739af0, 0x1) kernel .text 0xc010 0xc0130ba0 0xc0130cb0 0xc2015e10 0xc0164756 end_that_request_first+0x66 (0xc11c2c20, 0x1, 0xc031cf04) kernel .text 0xc010 0xc01646f0 0xc01647b0 0xc2015e30 0xc018a128 ide_end_request+0x28 (0x1, 0xc11c5060) kernel .text 0xc010 0xc018a100 0xc018a180 0xc2015e64 0xc018e614 read_intr+0x104 (0xc031ce20) kernel .text 0xc010 0xc018e510 0xc018e650 0xc2015e88 0xc018bad6 ide_intr+0x106 (0xe, 0xc11c5060, 0xc2015ed4, 0x1c0) kernel .text 0xc010 0xc018b9d0 0xc018bb30 0xc2015ea8 0xc010ab50 handle_IRQ_event+0x30 (0xe, 0xc2015ed4, 0xc11de2e0) kernel .text 0xc010 0xc010ab20 0xc010ab80 0xc2015ecc 0xc010ace2 do_IRQ+0x72 (0xc02f4520, 0xc02a92ac, 0xc201c000, 0xc201c000, 0xfc18) kernel .text 0xc010 0xc010ac70 0xc010ad30 0xc01093f0 ret_from_intr kernel .text 0xc010 0xc01093f0 0xc0109410 Interrupt registers: eax = 0x0019 ebx = 0xc02f4520 ecx = 0xc02a92ac edx = 0xc201c000 esi = 0xc201c000 edi = 0xfc18 esp = 0xc2015f08 eip = 0xc0115816 ebp = 0xc2015f4c xss = 0x0018 xcs = 0xc010 eflags = 0x0287 xds = 0xc2070018 xes = 0xc2070018 origeax = 0xff0e regs = 0xc2015ed4 0xc0115816 schedule+0x1b6 kernel .text 0xc010 0xc0115660 0xc0115b00 0xc2015f70 0xc01155c7 schedule_timeout+0x17 (0xc2014000, 0x1785222) kernel .text 0xc010 0xc01155b0 0xc0115650 0xc2015fac 0xc4055753 [sunrpc]svc_recv+0x1a3 (0xc24b2470, 0xc207be00, 0x7fff) sunrpc .text 0xc404e060 0xc40555b0 0xc4055940 0xc2015fec 0xc40713f3 [nfsd]nfsd+0x253 nfsd .text 0xc4071060 0xc40711a0 0xc40714a0 0xc0107843 kernel_thread+0x23 kernel .text 0xc010 0xc0107820 0xc0107860 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/