[PATCH]Bug in lib/brlock.c

2001-04-23 Thread Takanori Kawano


I think lib/brlock.c need to be fixed as:

--- lib/brlock.cTue Apr 25 05:59:56 2000
+++ lib/brlock.c.fixed  Mon Apr 23 19:56:43 2001
@@ -25,7 +25,7 @@
int i;

for (i = 0; i < smp_num_cpus; i++)
-   write_lock(__brlock_array[idx] + cpu_logical_map(i));
+   write_lock(&__brlock_array[cpu_logical_map(i)][idx]);
 }

 void __br_write_unlock (enum brlock_indices idx)
@@ -33,7 +33,7 @@
int i;

for (i = 0; i < smp_num_cpus; i++)
-   write_unlock(__brlock_array[idx] + cpu_logical_map(i));
+   write_unlock(&__brlock_array[cpu_logical_map(i)][idx]);
 }

 #else /* ! __BRLOCK_USE_ATOMICS */


For the above, 2.4.1 kernel often panics on our socket onen/close 
stress testing.

regards,

---
Takanori Kawano
Hitachi Ltd,
Internet Systems Platform Division
[EMAIL PROTECTED]










-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel panics on raw I/O stress test

2001-04-20 Thread Takanori Kawano


> Could you try again with 2.4.4pre4 plus the below patch?
> 
>   
>ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre2/rawio-3

I suppose that 2.4.4-pre4 + rawio-3 patch still has SMP-unsafe
raw i/o code and can cause the same panic I reported.

I think the following scenario is possible if there are 3 or more CPUs.

(1) CPU0 enter rw_raw_dev()
(2) CPU0 execute alloc_kiovec(1, &iobuf)// drivers/char/raw.c line 309
(3) CPU0 enter brw_kiovec(rw, 1, &iobuf,..) // drivers/char/raw.c line 362
(4) CPU0 enter __wait_on_buffer()
(5) CPU0 execute run_task_queue() and wait
while buffer_locked(bh) is true.// fs/buffer.c line 152-158
(6) CPU1 enter end_buffer_io_kiobuf() with
 iobuf allocated at (2)
(7) CPU1 execute unlock_buffer()// fs/buffer.c line 1994
(8) CPU0 exit __wait_on_buffer()
(9) CPU0 exit brw_kiovec(rw, 1, &iobuf,..)
(10) CPU0 execute free_kiovec(1, &iobuf) // drivers/char/raw.c line 388
(11) The task on CPU2 reused the area freed
 at (10).
(12) CPU1 enter end_kio_request() and touch
 the corrupted iobuf, then panic.

---
Takanori Kawano
Hitachi Ltd,
Internet Systems Platform Division
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Kernel panics on raw I/O stress test

2001-04-19 Thread Takanori Kawano

When I ran raw I/O SCSI read/write test with 2.4.1 kernel 
on our IA64 8way SMP box, kernel paniced and following 
message was displayed.

  Aiee, killing interrupt handler!

No stack trace and register dump are displayed.

Then I analyze FSB traces around the panic, and found that 
following functions are called before panic().


   CPU0:  CPU1:

  ・瘢雹  ・瘢雹
  ・瘢雹  ・瘢雹
  ・瘢雹rw_raw_dev()
  ・瘢雹  ・瘢雹
  ・瘢雹  ・瘢雹
  ・瘢雹 brw_kiovec()
  ・瘢雹  ・瘢雹
  ・瘢雹  ・瘢雹
  ・瘢雹 free_kiovec()
  ・瘢雹  ・瘢雹
  ・瘢雹  ・瘢雹
end_kio_request()
 __wake_up()
 ia64_do_page_fault()
  do_exit()
   panic()

I suppose that free_kiobuf() is called on CPU1 before 
end_kio_request() is called on CPU0 for the same kiobuf
and resulted in the panic.
In 2.4.1 source code, I think there is no assurance 
that free_kiovec() in rw_raw_dev() is called after 
end_kio_request() is done.

I tried following two workarounds. 

(1) Wait in rw_raw_dev() while io_count is positive. 

--- drivers/char/raw.cMon Oct  2 12:35:15 2000
+++ drivers/char/raw.c.workaround   Thu Apr 19 16:54:26 2001
@@ -333,6 +333,11 @@
break;
}

+   while(atomic_read(&iobuf->io_count)) {
+ set_task_state(current, TASK_UNINTERRUPTIBLE);
+ schedule();
+   }
+
free_kiovec(1, &iobuf);

if (transferred) {



(2) Keep buffer lock until end_kio_request() is done.

--- fs/buffer.c   Tue Jan 16 05:42:32 2001
+++ fs/buffer.c.workaround  Thu Apr 19 17:22:19 2001
@@ -1990,8 +1990,8 @@
mark_buffer_uptodate(bh, uptodate);

kiobuf = bh->b_private;
-   unlock_buffer(bh);
end_kio_request(kiobuf, uptodate);
+   unlock_buffer(bh);
 }



Both of them worked well for our raw I/O testing,
but I'm not sure they are right.

Does anybody have comments? 

regards,

---
Takanori Kawano
Hitachi Ltd,
Internet Systems Platform Division
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/