Re: aacraid: kernel: AAC: Host adapter dead -1 (bisected)
On Tuesday 17 of January 2017, Dave Carroll wrote: > > Hi. > > > > There is a bug with handling of adaptec raid cards (in my case it is > > Adaptec 3405) where kernel logs hundreds of "AAC: Host adapter dead -1" > > messages. > > > > Bug was reported previously on lkml but there was no progres in solving > > it. > > > > There is also bugzilla entry: > > https://bugzilla.kernel.org/show_bug.cgi?id=151661 > > > > I've bisected that to commit bellow and indeed, reverting it from kernel > > 4.9.3 makes messages go away. > > > > Could anyone at microsemi look at this regression? > > > > Thanks > > Hi Arkadiusz, > > Thanks for your effort in determining the cause of the issue. It makes > sense now that the patch should have been included in controller specific > code, rather than common code. > > I will prepare a patch for this, and if you are willing to test it, that > would be great! Great! I have dedicated machine for testing this, so yes - I'll test. -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org ) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
aacraid: kernel: AAC: Host adapter dead -1 (bisected)
Hi. There is a bug with handling of adaptec raid cards (in my case it is Adaptec 3405) where kernel logs hundreds of "AAC: Host adapter dead -1" messages. Bug was reported previously on lkml but there was no progres in solving it. There is also bugzilla entry: https://bugzilla.kernel.org/show_bug.cgi?id=151661 I've bisected that to commit bellow and indeed, reverting it from kernel 4.9.3 makes messages go away. Could anyone at microsemi look at this regression? Thanks commit 78cbccd3bd683c295a44af8050797dc4a41376ff Author: Raghava Aditya RenukuntaDate: Mon Apr 25 23:32:37 2016 -0700 aacraid: Fix for KDUMP driver hang When KDUMP is triggered the driver first talks to the firmware in INTX mode, but the adapter firmware is still in MSIX mode. Therefore the first driver command hangs since the driver is waiting for an INTX response and firmware gives a MSIX response. If when the OS is installed on a RAID drive created by the adapter KDUMP will hang since the driver does not receive a response in sync mode. Fixed by: Change the firmware to INTX mode if it is in MSIX mode before sending the first sync command. Cc: sta...@vger.kernel.org Signed-off-by: Raghava Aditya Renukunta Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen my hardware: 02:0e.0 RAID bus controller [0104]: Adaptec AAC-RAID [9005:0285] Subsystem: Adaptec 3405 [9005:02bb] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- http://vger.kernel.org/majordomo-info.html
runtime change of use_blk_mq ?
Hi. Is runtime enabling/disabling of blk-mq supported? Doesn't seem to work here: 4.1.30 (but the same thing on 4.6.3): # zcat /proc/config.gz |grep _MQ_DEF # CONFIG_SCSI_MQ_DEFAULT is not set # CONFIG_DM_MQ_DEFAULT is not set # cat /sys/module/scsi_mod/parameters/use_blk_mq N # grep "" /sys/block/sd*/queue/scheduler /sys/block/sda/queue/scheduler:noop [deadline] cfq /sys/block/sdb/queue/scheduler:noop [deadline] cfq /sys/block/sdc/queue/scheduler:noop [deadline] cfq /sys/block/sdd/queue/scheduler:noop [deadline] cfq /sys/block/sde/queue/scheduler:noop [deadline] cfq /sys/block/sdf/queue/scheduler:noop [deadline] cfq /sys/block/sdg/queue/scheduler:noop [deadline] cfq # echo Y > /sys/module/scsi_mod/parameters/use_blk_mq # cat /sys/module/scsi_mod/parameters/use_blk_mq Y # grep "" /sys/block/sd*/queue/scheduler /sys/block/sda/queue/scheduler:noop [deadline] cfq /sys/block/sdb/queue/scheduler:noop [deadline] cfq /sys/block/sdc/queue/scheduler:noop [deadline] cfq /sys/block/sdd/queue/scheduler:noop [deadline] cfq /sys/block/sde/queue/scheduler:noop [deadline] cfq /sys/block/sdf/queue/scheduler:noop [deadline] cfq /sys/block/sdg/queue/scheduler:noop [deadline] cfq so use_blk_mq is Y but queue/scheduler for existing devices still contains I/O schedulers and shows deadline as active. Looks like blk-mq is not active. -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org ) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
aacraid and rotational 1 even for ssd disks
Hi. I wonder if aacraid shouldn't properly inform kernel about sdd disk via /sys/devices/../queue/rotational flag? Using 3.18.22 kernel and aacraid driver tells linux that all drives are rotational: raid: # cat /sys/devices/pci:80/:80:03.0/:81:00.0/host10/target10:0:0/10:0:0:0/block/sda/queue/rotational 1 lvm on top of that raid: # cat /sys/devices/virtual/block/dm-*/queue/rotational 1 1 1 while this system has raid array of SSD drives only (INTEL SSDSC2BB30). No rotational disks. Adaptec ASR8405 7.8-0 (32730) firmware. -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org ) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
help decoding aacraid errors (3.10.40 kernel)
Hello. I'm using 3.10.40 kernel with adaptec 3405 and unfortunately I'm getting I/O errors and dmesg errors like below. Is there a howto how to decode these errors or what these actually mean? Some details, including full dmesg output at http://ixion.pld-linux.org/~arekm/p2/web3-adaptec-problem/ eg http://ixion.pld-linux.org/~arekm/p2/web3-adaptec-problem/dmesg.txt That doesn't seem to be hard disk error (at least I don't see any bad errors from arcconf tool). [3757350.671843] sd 0:0:2:0: [sdc] CDB: [3757350.671844] cdb[0]=0x28: 28 00 00 a0 2c a8 00 00 08 00 [3757350.671856] sd 0:0:2:0: [sdc] Unhandled sense code [3757350.671858] sd 0:0:2:0: [sdc] [3757350.671860] Result: hostbyte=0x00 driverbyte=0x08 [3757350.671862] sd 0:0:2:0: [sdc] [3757350.671863] Sense Key : 0x4 [current] [3757350.671866] sd 0:0:2:0: [sdc] [3757350.671868] ASC=0x44 ASCQ=0x0 [3757350.671870] sd 0:0:2:0: [sdc] CDB: [3757350.671872] cdb[0]=0x28: 28 00 00 a0 2c b0 00 00 08 00 [3757350.671884] sd 0:0:2:0: [sdc] Unhandled sense code [3757350.671886] sd 0:0:2:0: [sdc] [3757350.671888] Result: hostbyte=0x00 driverbyte=0x08 [3757350.671890] sd 0:0:2:0: [sdc] [3757350.671892] Sense Key : 0x4 [current] [3757350.671894] sd 0:0:2:0: [sdc] [3757350.671896] ASC=0x44 ASCQ=0x0 [3757350.671899] sd 0:0:2:0: [sdc] CDB: [3757350.671900] cdb[0]=0x28: 28 00 00 a0 2c b8 00 00 08 00 [3757350.671912] sd 0:0:2:0: [sdc] Unhandled sense code [3757350.671914] sd 0:0:2:0: [sdc] [3757350.671916] Result: hostbyte=0x00 driverbyte=0x08 [3757350.671918] sd 0:0:2:0: [sdc] [3757350.671920] Sense Key : 0x4 [current] [3757350.671923] sd 0:0:2:0: [sdc] [3757350.671924] ASC=0x44 ASCQ=0x0 [3757350.671927] sd 0:0:2:0: [sdc] CDB: [3757350.671928] cdb[0]=0x28: 28 00 00 a0 2c c0 00 00 08 00 [3757350.671939] sd 0:0:2:0: [sdc] Unhandled sense code [3757350.671941] sd 0:0:2:0: [sdc] [3757350.671943] Result: hostbyte=0x00 driverbyte=0x08 [3757350.671945] sd 0:0:2:0: [sdc] [3757350.671947] Sense Key : 0x4 [current] [3757350.671950] sd 0:0:2:0: [sdc] [3757350.671951] ASC=0x44 ASCQ=0x0 [3757350.671954] sd 0:0:2:0: [sdc] CDB: [3757350.671955] cdb[0]=0x28: 28 00 00 a0 2c c8 00 00 08 00 -- Arkadiusz Miśkiewicz, arekm / maven.pl Q: vger.kernel.org postmasters - always rude and impolite, hmm? -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help decoding aacraid errors (3.10.40 kernel)
On Friday 27 of June 2014, Bryn M. Reeves wrote: On Fri, Jun 27, 2014 at 10:55:08AM +0200, Arkadiusz Miskiewicz wrote: [3757350.671860] Result: hostbyte=0x00 driverbyte=0x08 [3757350.671862] sd 0:0:2:0: [sdc] [3757350.671863] Sense Key : 0x4 [current] http://www.t10.org/lists/2sensekey.htm 0x4 is hardware error. [3757350.671866] sd 0:0:2:0: [sdc] [3757350.671868] ASC=0x44 ASCQ=0x0 http://www.t10.org/lists/asc-num.htm 0x44/0x00 is internal target failure. Thanks for links. I wonder why kernel doesn't decode these to be actually readable without a need for asking on ml - was decoding considered? Anyway Adaptec support helped with details: http://ask.adaptec.com/app/answers/detail/a_id/14947/track/AvOF~QqxDv8S~TbvGmIW~yLb_fsq5C75Mv~s~zj~PP8l Basically this controller finds bad stripes and blocks any access to these areas resulting in errors :-/ What's more interesting there is no recovery procedure beside recreation. Fun :-) Regards, Bryn. -- Arkadiusz Miśkiewicz, arekm / maven.pl Q: vger.kernel.org postmasters - always rude and impolite, hmm? -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help decoding aacraid errors (3.10.40 kernel)
On Friday 27 of June 2014, Bryn M. Reeves wrote: On Fri, Jun 27, 2014 at 12:59:18PM +0200, Arkadiusz Miskiewicz wrote: Thanks for links. I wonder why kernel doesn't decode these to be actually readable without a need for asking on ml - was decoding considered? Normally it does; I was a bit surprised to see numbers printed with such a recent kernel. Sense key decoding to text has been around almost forever (the 'snstext' table of sense strings pre-dates git, i.e. 2.6.12ish). Is it possible your kernel was built without CONFIG_SCSI_CONSTANTS? Bingo! Thanks for your help. -- Arkadiusz Miśkiewicz, arekm / maven.pl Q: vger.kernel.org postmasters - always rude and impolite, hmm? -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mptsas related problem [2.6.22], LSI SAS1064ET PCI-E
On Monday 10 of September 2007, Arkadiusz Miskiewicz wrote: Hello, SR2520SAXS platform (S5000VSA mainboard in 2U SR2520 chassis) with 08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02) onboard (well afaik on backplane) and four SATA discs in various software raids level (1, 5 and 10) per partition. Under bigger I/O load mptsas kicks out hdd disks like: mptsas: ioc0: removing sata device, channel 0, id 24, phy 2 I was able to reproduce it on second machine (also SR2520SAXS platform) in ~2 hours of heavy IO with other Samsung hard disks (400GB, previously 320GB). I also tested WDC WD3200YS-01P 320GB hard disk but the problem didn't occur here even after 20h of testing. Looks like it's triggered only with some hard disks. I'm going to test Seagate disks now. mptscsih: ioc0: attempting task abort! (sc=810158ecb540) sd 0:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 0a ad 28 de 00 00 10 00 mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) mptbase: ioc0: LogInfo(0x3114): Originator={PL}, Code={IO Executed}, SubCode(0x) mptscsih: ioc0: task abort: SUCCESS (sc=810158ecb540) mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) mptscsih: ioc0: attempting target reset! (sc=810158ecb540) sd 0:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 0a ad 28 de 00 00 10 00 mptscsih: ioc0: target reset: SUCCESS (sc=810158ecb540) mptscsih: ioc0: attempting task abort! (sc=81006ea80b40) sd 0:0:1:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 16 21 3c 00 00 0a 00 mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) mptbase: ioc0: LogInfo(0x3114): Originator={PL}, Code={IO Executed}, SubCode(0x) mptscsih: ioc0: task abort: SUCCESS (sc=81006ea80b40) mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) mptbase: ioc0: LogInfo(0x3000): Originator={PL}, Code={Reset}, SubCode(0x1000) mptbase: ioc0: LogInfo(0x3000): Originator={PL}, Code={Reset}, SubCode(0x1000) mptscsih: ioc0: attempting target reset! (sc=81006ea80b40) sd 0:0:1:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 16 21 3c 00 00 0a 00 mptscsih: ioc0: target reset: SUCCESS (sc=81006ea80b40) mptscsih: ioc0: attempting task abort! (sc=81006ea80b40) sd 0:0:1:0: [sdb] CDB: cdb[0]=0x0: 00 00 00 00 00 00 mptbase: ioc0: LogInfo(0x3113): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x) mptscsih: ioc0: task abort: SUCCESS (sc=81006ea80b40) mptscsih: ioc0: attempting bus reset! (sc=81006ea80b40) sd 0:0:1:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 16 21 3c 00 00 0a 00 mptscsih: ioc0: bus reset: SUCCESS (sc=81006ea80b40) mptscsih: ioc0: attempting task abort! (sc=81006ea80b40) sd 0:0:1:0: [sdb] CDB: cdb[0]=0x0: 00 00 00 00 00 00 mptbase: ioc0: LogInfo(0x3113): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x) mptscsih: ioc0: task abort: SUCCESS (sc=81006ea80b40) mptscsih: ioc0: attempting task abort! (sc=81006ea806c0) sd 0:0:1:0: [sdb] CDB: cdb[0]=0x0: 00 00 00 00 00 00 mptbase: ioc0: LogInfo(0x3113): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x) mptscsih: ioc0: task abort: SUCCESS (sc=81006ea806c0) mptscsih: ioc0: Attempting host reset! (sc=81006ea80b40) mptbase: Initiating ioc0 recovery sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 35004742 raid10: Disk failure on sdb3, disabling device. Operation continuing on 3 devices sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 35004732 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 50036487 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 57635839 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 75634150 raid5: Disk failure on sdb4, disabling device. Operation continuing on 3 devices sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 82503262 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 83868190 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 105214542 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 131203934 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 161928270 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 161928398 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 175936606 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00 end_request: I/O error, dev sdb, sector 187551886 sd 0:0:1:0: [sdb] Result
Re: mptsas related problem [2.6.22], LSI SAS1064ET PCI-E
On Tuesday 18 of September 2007, Moore, Eric wrote: On Monday, September 10, 2007 11:56 AM, Arkadiusz Miskiewicz wrote: SR2520SAXS platform (S5000VSA mainboard in 2U SR2520 chassis) with 08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02) You have a 1064E B1 part. How much system memory do you have? 4GB on first and 1Gb on second. ps. If someone knows how FwRev=0110h relates to firmware version at http://downloadcenter.intel.com/filter_results.aspx?strTypes=a llProductID=2487OSFullName=OS+Independentlang=engstrOSs=38 submit=Go%21 then that's also interesting. Simply 01 10 ...== 1.16.0.0 ? 1.16 firmware is Phase 7 firmware, which is over a year old. Perhaps you could attempt upgrading by obtaining it from http://www.lsi.com/support/index.html or sending a request to [EMAIL PROTECTED] Phase 11 just released. The question is: is it safe to use lsi.com firmware for Intel SR2520SAXS platform where SAS1064ET resides on backplane (afaik) ? (There is no new firmware on intel website). sd 0:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 0a ad 28 de 00 00 10 00 mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) 0x403 means FRAME_XFER_ERROR, which says your command packet didn't make it. mptbase: ioc0: LogInfo(0x3000): Originator={PL}, Code={Reset}, SubCode(0x1000) 0x1000 means SATA_INIT_TIMEOUT Can you obtain a SAS trace? Probably, just tell me how to get sas trace. -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
mptsas related problem [2.6.22], LSI SAS1064ET PCI-E
Hello, SR2520SAXS platform (S5000VSA mainboard in 2U SR2520 chassis) with 08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02) onboard (well afaik on backplane) and four SATA discs in various software raids level (1, 5 and 10) per partition. Under bigger I/O load mptsas kicks out hdd disks like: mptsas: ioc0: removing sata device, channel 0, id 24, phy 2 So far it dropped sdb twice and sdc once. Not sure if these are fauly harddrives or controller problems or driver problems. Any hints? ps. If someone knows how FwRev=0110h relates to firmware version at http://downloadcenter.intel.com/filter_results.aspx?strTypes=allProductID=2487OSFullName=OS+Independentlang=engstrOSs=38submit=Go%21 then that's also interesting. Simply 01 10 ...== 1.16.0.0 ? mptbase: Initiating ioc0 bringup ioc0: SAS1064E: Capabilities={Initiator} PCI: Setting latency timer of device :08:00.0 to 64 scsi0 : ioc0: LSISAS1064E, FwRev=0110h, Ports=1, MaxQ=511, IRQ=16 Device driver host0 lacks bus and class support for being resumed. Device driver phy-0:0 lacks bus and class support for being resumed. Device driver port-0:0 lacks bus and class support for being resumed. Device driver expander-0:0 lacks bus and class support for being resumed. Device driver phy-0:1 lacks bus and class support for being resumed. Device driver phy-0:2 lacks bus and class support for being resumed. Device driver phy-0:3 lacks bus and class support for being resumed. Device driver phy-0:0:4 lacks bus and class support for being resumed. Device driver phy-0:0:5 lacks bus and class support for being resumed. Device driver port-0:0:0 lacks bus and class support for being resumed. Device driver end_device-0:0:0 lacks bus and class support for being resumed. Device driver target0:0:0 lacks bus and class support for being resumed. scsi 0:0:0:0: Direct-Access ATA SAMSUNG HD321KJ 0-10 PQ: 0 ANSI: 5 Device driver phy-0:0:6 lacks bus and class support for being resumed. Device driver port-0:0:1 lacks bus and class support for being resumed. Device driver end_device-0:0:1 lacks bus and class support for being resumed. Device driver target0:0:1 lacks bus and class support for being resumed. scsi 0:0:1:0: Direct-Access ATA SAMSUNG HD321KJ 0-10 PQ: 0 ANSI: 5 Device driver phy-0:0:7 lacks bus and class support for being resumed. Device driver phy-0:0:8 lacks bus and class support for being resumed. Device driver port-0:0:2 lacks bus and class support for being resumed. Device driver end_device-0:0:2 lacks bus and class support for being resumed. Device driver target0:0:2 lacks bus and class support for being resumed. scsi 0:0:2:0: Direct-Access ATA SAMSUNG HD321KJ 0-10 PQ: 0 ANSI: 5 Device driver phy-0:0:9 lacks bus and class support for being resumed. Device driver port-0:0:3 lacks bus and class support for being resumed. Device driver end_device-0:0:3 lacks bus and class support for being resumed. Device driver target0:0:3 lacks bus and class support for being resumed. scsi 0:0:3:0: Direct-Access ATA SAMSUNG HD321KJ 0-10 PQ: 0 ANSI: 5 Device driver phy-0:0:10 lacks bus and class support for being resumed. Device driver port-0:0:4 lacks bus and class support for being resumed. Device driver phy-0:0:11 lacks bus and class support for being resumed. Device driver phy-0:0:12 lacks bus and class support for being resumed. Device driver phy-0:0:13 lacks bus and class support for being resumed. Device driver phy-0:0:14 lacks bus and class support for being resumed. Device driver port-0:0:5 lacks bus and class support for being resumed. Device driver port-0:0:5 lacks bus and class support for being resumed. Device driver end_device-0:0:5 lacks bus and class support for being resumed. Device driver target0:0:4 lacks bus and class support for being resumed. scsi 0:0:4:0: Enclosure ESG-SHV. SCA HSBP M13 2.04 PQ: 0 ANSI: 3 sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 73 00 00 08 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 73 00 00 08 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sda4 sd 0:0:0:0: [sda] Attached SCSI disk sd 0:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Mode Sense: 73 00 00 08 sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Mode Sense: 73 00 00 08 sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sdb2 sdb3 sdb4 sd 0:0:1:0: [sdb] Attached
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wednesday 29 of August 2007, Jens Axboe wrote: On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: On Wednesday 29 of August 2007, Jens Axboe wrote: On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: On Wednesday 29 of August 2007, Jens Axboe wrote: On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: I guess I should sent these here since it looks like not scsi bug anyway. It's stex, right? It seems to have some issues with multiple completions of commands, which craps out the block layer of course. Yes, stex. I'm staying with 2.6.19 in that case since it works fine in that version. So scsi bug ... 8-) And you based that conclusion on what exactly? Isn't drivers/scsi/* handled by [EMAIL PROTECTED] (that's what I mean) Yep indeed, I thought you meant that it was a scsi bug (and not an stex one). You could try and copy the 2.6.19 stex driver into 2.6.20 and see if that works, though. Looks like this bug is known for months :-( Ed Lin pointed to http://lkml.org/lkml/2007/1/23/268 with possible patch (that unfortunately serialises access to storage devices, well...) There is also: http://bugzilla.kernel.org/show_bug.cgi?id=7842 I'm running 2.6.22 with that patch now, did huge (few hours) rsync that previously caused oopses and now everything works properly. Can we get some form of this patch into Linus tree? -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.22 oops kernel BUG at block/elevator.c:366!
Hello, I'm trying to get stable kernel for Promise SuperTrak X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed like this (while doing rsync): kernel BUG at block/elevator.c:366! invalid opcode: [1] SMP CPU 1 Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs scsi_wait_scan sd_mod stex scsi_mod Pid: 1139:#0, comm: xfsbufd Not tainted 2.6.22.5-0.2 #1 RIP: 0010:[8033f5da] [8033f5da] elv_rb_del+0x3a/0x40 RSP: :8100759b1c00 EFLAGS: 00010046 RAX: 81000d1f5428 RBX: 81000d1f5428 RCX: 81007c1a1a00 RDX: RSI: 81000d1f53b0 RDI: 81007c102af0 RBP: 81000d1f53b0 R08: 81004a9dab50 R09: R10: R11: 880072c0 R12: 81007c102ac0 R13: 81007c1a1a00 R14: 0004 R15: 81007c102b18 FS: 2ba2cafc9be0() GS:81007d0a5b40() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2ba2cab5a158 CR3: 3c5ce000 CR4: 06e0 Process xfsbufd (pid: 1139[#0], threadinfo 8100759b, task 81007cac1040) Stack: 0001 81007c102ac0 81000d1f53b0 8034abe8 0246 81000d1f53b0 81007c1a1a00 81007c102ac0 81007c0f2d08 0004 81007c102b18 8034ad55 Call Trace: [8034abe8] cfq_remove_request+0x78/0x1b0 [8034ad55] cfq_dispatch_insert+0x35/0x70 [8034b61f] cfq_dispatch_requests+0x1bf/0x3a0 [8033f11f] elv_next_request+0x3f/0x150 [80243b04] lock_timer_base+0x34/0x70 [88007329] :scsi_mod:scsi_request_fn+0x69/0x3d0 [80343d46] __make_request+0xe6/0x5d0 [8034158b] generic_make_request+0x18b/0x230 [8034438a] submit_bio+0x5a/0xf0 [8808d1e9] :xfs:_xfs_buf_ioapply+0x199/0x340 [8808e099] :xfs:xfs_buf_iorequest+0x29/0x80 [88092fbb] :xfs:xfs_bdstrat_cb+0x3b/0x50 [8808e3c2] :xfs:xfsbufd+0x92/0x140 [8808e330] :xfs:xfsbufd+0x0/0x140 [8024fa3b] kthread+0x4b/0x80 [8020b0a8] child_rip+0xa/0x12 [8024f9f0] kthread+0x0/0x80 [8020b09e] child_rip+0x0/0x12 Code: 0f 0b eb fe 66 90 48 83 ec 08 49 89 f8 48 89 f8 31 c9 eb 09 RIP [8033f5da] elv_rb_del+0x3a/0x40 RSP 8100759b1c00 I can reproduce it without bigger problem. Here are the same oopses on 2.6.20: http://paste.stgraber.org/3138 This is 1 x dual core athlon64 on asus m2npv mainboard, 2GB RAM. There is hw raid on fasttrack 16350 only (no software one). Has anyone seen this ? Going to try without cfq. -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote: Hello, I'm trying to get stable kernel for Promise SuperTrak X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed like this (while doing rsync): With anticipatory: berta login: [ cut here ] kernel BUG at block/as-iosched.c:1084! invalid opcode: [1] SMP CPU 1 Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs scsi_wait_scan sd_mod stex scsi_mod Pid: 32:#0, comm: kblockd/1 Not tainted 2.6.22.5-0.2 #1 RIP: 0010:[80349028] [80349028] as_dispatch_request+0x438/0x460 RSP: 0018:81007d1fddc0 EFLAGS: 00010046 RAX: RBX: 81007c765a00 RCX: RDX: 81007c765a28 RSI: RDI: 81007c54ad08 RBP: R08: R09: 81006a289d80 R10: R11: 0001 R12: R13: 0001 R14: R15: 81007cf85048 FS: 2ba4421e8b00() GS:81007d0a5b40() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2ba46298f000 CR3: 50951000 CR4: 06e0 Process kblockd/1 (pid: 32[#0], threadinfo 81007d1fc000, task 81007d1db040) Stack: 81007c54ad08 81007cf85000 81007cf7e000 81007d1fde00 81006a289cc0 8033f11f 0287 88000fa8 81001646a6f8 81007cf85000 81007cf7e000 Call Trace: [8033f11f] elv_next_request+0x3f/0x150 [88000fa8] :scsi_mod:scsi_dispatch_cmd+0x1c8/0x310 [88007329] :scsi_mod:scsi_request_fn+0x69/0x3d0 [80347b30] as_work_handler+0x0/0x50 [80347b5c] as_work_handler+0x2c/0x50 [8024b94c] run_workqueue+0xcc/0x170 [8024c3a0] worker_thread+0x0/0x110 [8024c3a0] worker_thread+0x0/0x110 [8024c443] worker_thread+0xa3/0x110 [8024fe10] autoremove_wake_function+0x0/0x30 [8024c3a0] worker_thread+0x0/0x110 [8024c3a0] worker_thread+0x0/0x110 [8024fa3b] kthread+0x4b/0x80 [8020b0a8] child_rip+0xa/0x12 [8024f9f0] kthread+0x0/0x80 [8020b09e] child_rip+0x0/0x12 Code: 0f 0b eb fe 0f 0b eb fe 31 ed c7 83 b8 00 00 00 01 00 00 00 RIP [80349028] as_dispatch_request+0x438/0x460 RSP 81007d1fddc0 -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote: On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote: Hello, I'm trying to get stable kernel for Promise SuperTrak X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed like this (while doing rsync): With anticipatory: berta login: [ cut here ] kernel BUG at block/as-iosched.c:1084! One more information: I'm currently running 2.6.19 for few hours and the oops doesn't happen. Looks like some regression introduced between 2.6.19 and 2.6.20. -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html