Bug#392986: linux-image-2.6.16-2-em64t-p4-smp: megaraid_sas issues warnings and RESETs

2007-08-26 Thread Andrew Moise
  After being advised that upgrading to more recent firmware fixes
this problem, I've installed Dell driver update R149666, which
upgrades the Perc 5i firmware to version v5.1.1-0040.  That seems to
solve this problem for me even when I boot back into the unmodified
2.6.16 kernel.  I therefore believe that this bug should be considered
a firmware bug instead of a kernel bug, and closed in Debian's BTS.
  Thanks!


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#392986: linux-image-2.6.16-2-em64t-p4-smp: megaraid_sas issues warnings and RESETs

2006-10-17 Thread Andrew Moise

 Sumant Patro confirmed that that one-liner is a critical fix that
must be applied.  Unfortunately, it doesn't fix my problem :-(.
I've gone back to the BLKDEV_MAX_RQ workaround for now; if I learn of
a better solution, I'll send it along.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#392986: linux-image-2.6.16-2-em64t-p4-smp: megaraid_sas issues warnings and RESETs on Perc 5i in Dell PE 2950

2006-10-14 Thread Andrew Moise
Package: linux-image-2.6.16-2-em64t-p4-smp
Version: 2.6.16-18~bpo.1
Severity: normal

  I feel very bad filing a bug against a backports package, but this
backports package is (according to the changelog) an unmodified
2.6.16-18 package, just recompiled for sarge, and people on the
mailing lists are reporting problems with a variety of OSes [1], so I
have a feeling it's a genuine driver bug with this kernel version.
  In any case, the problem is that under heavy write load, I get
messages like these in /var/log/kern.log:

Oct  2 14:36:01 localhost kernel: sd 0:2:1:0: megasas: RESET -55455 cmd=2a
Oct  2 14:36:01 localhost kernel: megasas: reset successful
Oct  2 14:36:31 localhost kernel: sd 0:2:1:0: megasas: RESET -70369 cmd=2a
Oct  2 14:36:31 localhost kernel: megasas: reset successful
Oct  2 14:37:02 localhost kernel: sd 0:2:1:0: megasas: RESET -83487 cmd=2a
Oct  2 14:37:02 localhost kernel: megasas: reset successful
Oct  2 14:37:32 localhost kernel: sd 0:2:1:0: megasas: RESET -95079 cmd=2a
Oct  2 14:37:32 localhost kernel: megasas: reset successful
Oct  2 14:38:02 localhost kernel: sd 0:2:1:0: megasas: RESET -105361 cmd=2a
Oct  2 14:38:02 localhost kernel: megasas: reset successful
Oct  2 14:38:33 localhost kernel: sd 0:2:1:0: megasas: RESET -115613 cmd=2a
Oct  2 14:38:33 localhost kernel: megasas: reset successful
Oct  2 14:38:33 localhost kernel: sd 0:2:1:0: SCSI error: return code = 
0x600
Oct  2 14:38:33 localhost kernel: end_request: I/O error, dev sdb, sector 
2927091007
Oct  2 14:38:33 localhost kernel: Buffer I/O error on device sdb1, logical 
block 731772736
Oct  2 14:38:33 localhost kernel: lost page write due to I/O error on sdb1
Oct  2 14:39:03 localhost kernel: sd 0:2:1:0: megasas: RESET -125667 cmd=2a
Oct  2 14:39:03 localhost kernel: megasas: reset successful
Oct  2 14:39:33 localhost kernel: sd 0:2:1:0: megasas: RESET -135588 cmd=2a
Oct  2 14:39:33 localhost kernel: megasas: [ 0]waiting for 1 commands to 
complete
Oct  2 14:39:34 localhost kernel: megasas: reset successful

  A mailing list posting recommended reducing BLKDEV_MAX_RQ to 8 in
include/linux/blkdev.h as a workaround; I've tried that, and it seems
to work for me.  I suspect that the following patch is the actual fix
(from recent changes to drivers/scsi/megaraid/megaraid_sas.c):

--- a/drivers/scsi/megaraid/megaraid_sas.c 2006-03-20 00:53:29.0 -0500
+++ b/drivers/scsi/megaraid/megaraid_sas.c  2006-10-13 12:25:04.0 
-0400
@@ -1716,6 +1823,12 @@
 * Get various operational parameters from status register
 */
instance-max_fw_cmds = 
instance-instancet-read_fw_status_reg(reg_set)  0x00;
+   /*
+* Reduce the max supported cmds by 1. This is to ensure that the
+* reply_q_sz (1 more than the max cmd that driver may send)
+* does not exceed max cmds that the FW can support
+*/
+   instance-max_fw_cmds = instance-max_fw_cmds-1;
instance-max_num_sge = 
(instance-instancet-read_fw_status_reg(reg_set)  0xFF)  0x10;
/*

  ... but, of course, I'm not entirely sure what I'm doing.  This is a
production server now, but I may be able to do some amount of testing
(like installing an etch or unstable partition to test more recent
Debian kernels) from time to time over weekends or during downtime.  If
you'd like me to, let me know and I'll see what I can do.
  I'm also planning to test the above patch after consulting with some
kernel hackers.  I'll let you know how it goes.

[1] http://lists.us.dell.com/pipermail/linux-poweredge/2006-October/027705.html
http://lkml.org/lkml/2006/9/6/12
http://lists.us.dell.com/pipermail/linux-poweredge/2006-August/026821.html

-- System Information:
Debian Release: 3.1
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.16+max-nr-req-8
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages linux-image-2.6.16-2-em64t-p4-smp depends on:
ii  e2fsprogs  1.37-2sarge1  ext2 file system utilities and lib
ii  initramfs-tools [linux-ini 0.80~bpo.1tools for generating an initramfs
ii  module-init-tools  3.2.2-3~bpo.1 tools for managing Linux kernel mo

-- debconf information:
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/abort-install-2.6.16-2-em64t-p4-smp:
  
linux-image-2.6.16-2-em64t-p4-smp/preinst/bootloader-initrd-2.6.16-2-em64t-p4-smp:
 true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/initrd-2.6.16-2-em64t-p4-smp:
  
linux-image-2.6.16-2-em64t-p4-smp/postinst/old-dir-initrd-link-2.6.16-2-em64t-p4-smp:
 true
  
linux-image-2.6.16-2-em64t-p4-smp/postinst/old-initrd-link-2.6.16-2-em64t-p4-smp:
 true
  
linux-image-2.6.16-2-em64t-p4-smp/preinst/already-running-this-2.6.16-2-em64t-p4-smp:
  
linux-image-2.6.16-2-em64t-p4-smp/postinst/bootloader-test-error-2.6.16-2-em64t-p4-smp:
  
linux-image-2.6.16-2-em64t-p4-smp/postinst/depmod-error-initrd-2.6.16-2-em64t-p4-smp:
 false