[PATCH]Bug in lib/brlock.c
I think lib/brlock.c need to be fixed as: --- lib/brlock.cTue Apr 25 05:59:56 2000 +++ lib/brlock.c.fixed Mon Apr 23 19:56:43 2001 @@ -25,7 +25,7 @@ int i; for (i = 0; i < smp_num_cpus; i++) - write_lock(__brlock_array[idx] + cpu_logical_map(i)); + write_lock(&__brlock_array[cpu_logical_map(i)][idx]); } void __br_write_unlock (enum brlock_indices idx) @@ -33,7 +33,7 @@ int i; for (i = 0; i < smp_num_cpus; i++) - write_unlock(__brlock_array[idx] + cpu_logical_map(i)); + write_unlock(&__brlock_array[cpu_logical_map(i)][idx]); } #else /* ! __BRLOCK_USE_ATOMICS */ For the above, 2.4.1 kernel often panics on our socket onen/close stress testing. regards, --- Takanori Kawano Hitachi Ltd, Internet Systems Platform Division [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel panics on raw I/O stress test
> Could you try again with 2.4.4pre4 plus the below patch? > > >ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre2/rawio-3 I suppose that 2.4.4-pre4 + rawio-3 patch still has SMP-unsafe raw i/o code and can cause the same panic I reported. I think the following scenario is possible if there are 3 or more CPUs. (1) CPU0 enter rw_raw_dev() (2) CPU0 execute alloc_kiovec(1, &iobuf)// drivers/char/raw.c line 309 (3) CPU0 enter brw_kiovec(rw, 1, &iobuf,..) // drivers/char/raw.c line 362 (4) CPU0 enter __wait_on_buffer() (5) CPU0 execute run_task_queue() and wait while buffer_locked(bh) is true.// fs/buffer.c line 152-158 (6) CPU1 enter end_buffer_io_kiobuf() with iobuf allocated at (2) (7) CPU1 execute unlock_buffer()// fs/buffer.c line 1994 (8) CPU0 exit __wait_on_buffer() (9) CPU0 exit brw_kiovec(rw, 1, &iobuf,..) (10) CPU0 execute free_kiovec(1, &iobuf) // drivers/char/raw.c line 388 (11) The task on CPU2 reused the area freed at (10). (12) CPU1 enter end_kio_request() and touch the corrupted iobuf, then panic. --- Takanori Kawano Hitachi Ltd, Internet Systems Platform Division [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel panics on raw I/O stress test
When I ran raw I/O SCSI read/write test with 2.4.1 kernel on our IA64 8way SMP box, kernel paniced and following message was displayed. Aiee, killing interrupt handler! No stack trace and register dump are displayed. Then I analyze FSB traces around the panic, and found that following functions are called before panic(). CPU0: CPU1: ・瘢雹 ・瘢雹 ・瘢雹 ・瘢雹 ・瘢雹rw_raw_dev() ・瘢雹 ・瘢雹 ・瘢雹 ・瘢雹 ・瘢雹 brw_kiovec() ・瘢雹 ・瘢雹 ・瘢雹 ・瘢雹 ・瘢雹 free_kiovec() ・瘢雹 ・瘢雹 ・瘢雹 ・瘢雹 end_kio_request() __wake_up() ia64_do_page_fault() do_exit() panic() I suppose that free_kiobuf() is called on CPU1 before end_kio_request() is called on CPU0 for the same kiobuf and resulted in the panic. In 2.4.1 source code, I think there is no assurance that free_kiovec() in rw_raw_dev() is called after end_kio_request() is done. I tried following two workarounds. (1) Wait in rw_raw_dev() while io_count is positive. --- drivers/char/raw.cMon Oct 2 12:35:15 2000 +++ drivers/char/raw.c.workaround Thu Apr 19 16:54:26 2001 @@ -333,6 +333,11 @@ break; } + while(atomic_read(&iobuf->io_count)) { + set_task_state(current, TASK_UNINTERRUPTIBLE); + schedule(); + } + free_kiovec(1, &iobuf); if (transferred) { (2) Keep buffer lock until end_kio_request() is done. --- fs/buffer.c Tue Jan 16 05:42:32 2001 +++ fs/buffer.c.workaround Thu Apr 19 17:22:19 2001 @@ -1990,8 +1990,8 @@ mark_buffer_uptodate(bh, uptodate); kiobuf = bh->b_private; - unlock_buffer(bh); end_kio_request(kiobuf, uptodate); + unlock_buffer(bh); } Both of them worked well for our raw I/O testing, but I'm not sure they are right. Does anybody have comments? regards, --- Takanori Kawano Hitachi Ltd, Internet Systems Platform Division [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/