Re: aacraid: kernel: AAC: Host adapter dead -1 (bisected)

2017-01-17 Thread Arkadiusz Miskiewicz
On Tuesday 17 of January 2017, Dave Carroll wrote:
> > Hi.
> > 
> > There is a bug with handling of adaptec raid cards (in my case it is
> > Adaptec 3405) where kernel logs hundreds of "AAC: Host adapter dead -1"
> > messages.
> > 
> > Bug was reported previously on lkml but there was no progres in solving
> > it.
> > 
> > There is also bugzilla entry:
> > https://bugzilla.kernel.org/show_bug.cgi?id=151661
> > 
> > I've bisected that to commit bellow and indeed, reverting it from kernel
> > 4.9.3 makes messages go away.
> > 
> > Could anyone at microsemi look at this regression?
> > 
> > Thanks
> 
> Hi Arkadiusz,
> 
> Thanks for your effort in determining the cause of the issue. It makes
> sense now that the patch should have been included in controller specific
> code, rather than common code.
> 
> I will prepare a patch for this, and if you are willing to test it, that
> would be great!

Great!

I have dedicated machine for testing this, so yes - I'll test.

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


aacraid: kernel: AAC: Host adapter dead -1 (bisected)

2017-01-15 Thread Arkadiusz Miskiewicz

Hi.

There is a bug with handling of adaptec raid cards (in my case it is Adaptec 
3405) where kernel logs hundreds of "AAC: Host adapter dead -1" messages.

Bug was reported previously on lkml but there was no progres in solving it.

There is also bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=151661

I've bisected that to commit bellow and indeed, reverting it from kernel 4.9.3 
makes messages go away.

Could anyone at microsemi look at this regression?

Thanks

commit 78cbccd3bd683c295a44af8050797dc4a41376ff
Author: Raghava Aditya Renukunta 
Date:   Mon Apr 25 23:32:37 2016 -0700

aacraid: Fix for KDUMP driver hang

When KDUMP is triggered the driver first talks to the firmware in INTX
mode, but the adapter firmware is still in MSIX mode. Therefore the first
driver command hangs since the driver is waiting for an INTX response and
firmware gives a MSIX response. If when the OS is installed on a RAID
drive created by the adapter KDUMP will hang since the driver does not
receive a response in sync mode.

Fixed by: Change the firmware to INTX mode if it is in MSIX mode before
sending the first sync command.

Cc: sta...@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta 

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Martin K. Petersen 

my hardware:
02:0e.0 RAID bus controller [0104]: Adaptec AAC-RAID [9005:0285]
Subsystem: Adaptec 3405 [9005:02bb]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping+ SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- http://vger.kernel.org/majordomo-info.html


runtime change of use_blk_mq ?

2016-08-10 Thread Arkadiusz Miskiewicz

Hi.

Is runtime enabling/disabling of blk-mq  supported?

Doesn't seem to work here:

4.1.30 (but the same thing on 4.6.3):

# zcat /proc/config.gz |grep _MQ_DEF
# CONFIG_SCSI_MQ_DEFAULT is not set
# CONFIG_DM_MQ_DEFAULT is not set
# cat  /sys/module/scsi_mod/parameters/use_blk_mq
N
# grep "" /sys/block/sd*/queue/scheduler
/sys/block/sda/queue/scheduler:noop [deadline] cfq
/sys/block/sdb/queue/scheduler:noop [deadline] cfq
/sys/block/sdc/queue/scheduler:noop [deadline] cfq
/sys/block/sdd/queue/scheduler:noop [deadline] cfq
/sys/block/sde/queue/scheduler:noop [deadline] cfq
/sys/block/sdf/queue/scheduler:noop [deadline] cfq
/sys/block/sdg/queue/scheduler:noop [deadline] cfq
# echo Y > /sys/module/scsi_mod/parameters/use_blk_mq
# cat  /sys/module/scsi_mod/parameters/use_blk_mq
Y
# grep "" /sys/block/sd*/queue/scheduler
/sys/block/sda/queue/scheduler:noop [deadline] cfq
/sys/block/sdb/queue/scheduler:noop [deadline] cfq
/sys/block/sdc/queue/scheduler:noop [deadline] cfq
/sys/block/sdd/queue/scheduler:noop [deadline] cfq
/sys/block/sde/queue/scheduler:noop [deadline] cfq
/sys/block/sdf/queue/scheduler:noop [deadline] cfq
/sys/block/sdg/queue/scheduler:noop [deadline] cfq

so use_blk_mq is Y but queue/scheduler for existing devices still contains I/O 
schedulers and shows deadline as active. Looks like blk-mq is not active.

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


aacraid and rotational 1 even for ssd disks

2015-11-07 Thread Arkadiusz Miskiewicz

Hi.

I wonder if aacraid shouldn't properly inform kernel about sdd disk via 
/sys/devices/../queue/rotational flag?

Using 3.18.22 kernel and aacraid driver tells linux that all drives are 
rotational:

raid:
# cat 
/sys/devices/pci:80/:80:03.0/:81:00.0/host10/target10:0:0/10:0:0:0/block/sda/queue/rotational
1

lvm on top of that raid:
# cat /sys/devices/virtual/block/dm-*/queue/rotational
1
1
1

while this system has raid array of SSD drives only (INTEL SSDSC2BB30). No 
rotational disks.

Adaptec ASR8405 7.8-0 (32730) firmware.

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


help decoding aacraid errors (3.10.40 kernel)

2014-06-27 Thread Arkadiusz Miskiewicz

Hello.

I'm using 3.10.40 kernel with adaptec 3405 and unfortunately I'm getting I/O 
errors and dmesg errors like below. Is there a howto how to decode these 
errors or what these actually mean?

Some details, including full dmesg output at
http://ixion.pld-linux.org/~arekm/p2/web3-adaptec-problem/
eg
http://ixion.pld-linux.org/~arekm/p2/web3-adaptec-problem/dmesg.txt

That doesn't seem to be hard disk error (at least I don't see any bad errors 
from arcconf tool).

[3757350.671843] sd 0:0:2:0: [sdc] CDB: 
[3757350.671844] cdb[0]=0x28: 28 00 00 a0 2c a8 00 00 08 00
[3757350.671856] sd 0:0:2:0: [sdc] Unhandled sense code
[3757350.671858] sd 0:0:2:0: [sdc]  
[3757350.671860] Result: hostbyte=0x00 driverbyte=0x08
[3757350.671862] sd 0:0:2:0: [sdc]  
[3757350.671863] Sense Key : 0x4 [current] 
[3757350.671866] sd 0:0:2:0: [sdc]  
[3757350.671868] ASC=0x44 ASCQ=0x0
[3757350.671870] sd 0:0:2:0: [sdc] CDB: 
[3757350.671872] cdb[0]=0x28: 28 00 00 a0 2c b0 00 00 08 00
[3757350.671884] sd 0:0:2:0: [sdc] Unhandled sense code
[3757350.671886] sd 0:0:2:0: [sdc]  
[3757350.671888] Result: hostbyte=0x00 driverbyte=0x08
[3757350.671890] sd 0:0:2:0: [sdc]  
[3757350.671892] Sense Key : 0x4 [current] 
[3757350.671894] sd 0:0:2:0: [sdc]  
[3757350.671896] ASC=0x44 ASCQ=0x0
[3757350.671899] sd 0:0:2:0: [sdc] CDB: 
[3757350.671900] cdb[0]=0x28: 28 00 00 a0 2c b8 00 00 08 00
[3757350.671912] sd 0:0:2:0: [sdc] Unhandled sense code
[3757350.671914] sd 0:0:2:0: [sdc]  
[3757350.671916] Result: hostbyte=0x00 driverbyte=0x08
[3757350.671918] sd 0:0:2:0: [sdc]  
[3757350.671920] Sense Key : 0x4 [current] 
[3757350.671923] sd 0:0:2:0: [sdc]  
[3757350.671924] ASC=0x44 ASCQ=0x0
[3757350.671927] sd 0:0:2:0: [sdc] CDB: 
[3757350.671928] cdb[0]=0x28: 28 00 00 a0 2c c0 00 00 08 00
[3757350.671939] sd 0:0:2:0: [sdc] Unhandled sense code
[3757350.671941] sd 0:0:2:0: [sdc]  
[3757350.671943] Result: hostbyte=0x00 driverbyte=0x08
[3757350.671945] sd 0:0:2:0: [sdc]  
[3757350.671947] Sense Key : 0x4 [current] 
[3757350.671950] sd 0:0:2:0: [sdc]  
[3757350.671951] ASC=0x44 ASCQ=0x0
[3757350.671954] sd 0:0:2:0: [sdc] CDB: 
[3757350.671955] cdb[0]=0x28: 28 00 00 a0 2c c8 00 00 08 00


-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Q: vger.kernel.org postmasters - always rude and impolite, hmm?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help decoding aacraid errors (3.10.40 kernel)

2014-06-27 Thread Arkadiusz Miskiewicz
On Friday 27 of June 2014, Bryn M. Reeves wrote:
 On Fri, Jun 27, 2014 at 10:55:08AM +0200, Arkadiusz Miskiewicz wrote:
  [3757350.671860] Result: hostbyte=0x00 driverbyte=0x08
  [3757350.671862] sd 0:0:2:0: [sdc]
  [3757350.671863] Sense Key : 0x4 [current]
 
 http://www.t10.org/lists/2sensekey.htm
 
 0x4 is hardware error.
 
  [3757350.671866] sd 0:0:2:0: [sdc]
  [3757350.671868] ASC=0x44 ASCQ=0x0
 
 http://www.t10.org/lists/asc-num.htm
 
 0x44/0x00 is internal target failure.

Thanks for links. I wonder why kernel doesn't decode these to be actually 
readable without a need for asking on ml - was decoding considered?

Anyway Adaptec support helped with details:

http://ask.adaptec.com/app/answers/detail/a_id/14947/track/AvOF~QqxDv8S~TbvGmIW~yLb_fsq5C75Mv~s~zj~PP8l

Basically this controller finds bad stripes and blocks any access to these 
areas resulting in errors :-/ What's more interesting there is no recovery 
procedure beside recreation. Fun :-)

 Regards,
 Bryn.


-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Q: vger.kernel.org postmasters - always rude and impolite, hmm?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help decoding aacraid errors (3.10.40 kernel)

2014-06-27 Thread Arkadiusz Miskiewicz
On Friday 27 of June 2014, Bryn M. Reeves wrote:
 On Fri, Jun 27, 2014 at 12:59:18PM +0200, Arkadiusz Miskiewicz wrote:
  Thanks for links. I wonder why kernel doesn't decode these to be actually
  readable without a need for asking on ml - was decoding considered?
 
 Normally it does; I was a bit surprised to see numbers printed with such
 a recent kernel.
 
 Sense key decoding to text has been around almost forever (the 'snstext'
 table of sense strings pre-dates git, i.e. 2.6.12ish).
 
 Is it possible your kernel was built without CONFIG_SCSI_CONSTANTS?

Bingo! Thanks for your help.

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Q: vger.kernel.org postmasters - always rude and impolite, hmm?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mptsas related problem [2.6.22], LSI SAS1064ET PCI-E

2007-09-18 Thread Arkadiusz Miskiewicz
On Monday 10 of September 2007, Arkadiusz Miskiewicz wrote:
 Hello,

 SR2520SAXS platform (S5000VSA mainboard in 2U SR2520 chassis) with

 08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET
 PCI-Express Fusion-MPT SAS (rev 02)

 onboard (well afaik on backplane) and four SATA discs in various software
 raids level (1, 5 and 10) per partition.

 Under bigger I/O load mptsas kicks out hdd disks like:
 mptsas: ioc0: removing sata device, channel 0, id 24, phy 2

I was able to reproduce it on second machine (also SR2520SAXS platform) in ~2 
hours of heavy IO with other Samsung hard disks (400GB, previously 320GB). I 
also tested WDC WD3200YS-01P 320GB hard disk but the problem didn't occur 
here even after 20h of testing.

Looks like it's triggered only with some hard disks. I'm going to test Seagate 
disks now.

 mptscsih: ioc0: attempting task abort! (sc=810158ecb540)
 sd 0:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 0a ad 28 de 00 00 10 00
 mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort},
 SubCode(0x0403)
 mptbase: ioc0: LogInfo(0x3114): Originator={PL}, Code={IO Executed},
 SubCode(0x)
 mptscsih: ioc0: task abort: SUCCESS (sc=810158ecb540)
 mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort},
 SubCode(0x0403)
 mptscsih: ioc0: attempting target reset! (sc=810158ecb540)
 sd 0:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 0a ad 28 de 00 00 10 00
 mptscsih: ioc0: target reset: SUCCESS (sc=810158ecb540)
 mptscsih: ioc0: attempting task abort! (sc=81006ea80b40)
 sd 0:0:1:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 16 21 3c 00 00 0a 00
 mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort},
 SubCode(0x0403)
 mptbase: ioc0: LogInfo(0x3114): Originator={PL}, Code={IO Executed},
 SubCode(0x)
 mptscsih: ioc0: task abort: SUCCESS (sc=81006ea80b40)
 mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort},
 SubCode(0x0403)
 mptbase: ioc0: LogInfo(0x3000): Originator={PL}, Code={Reset},
 SubCode(0x1000)
 mptbase: ioc0: LogInfo(0x3000): Originator={PL}, Code={Reset},
 SubCode(0x1000)
 mptscsih: ioc0: attempting target reset! (sc=81006ea80b40)
 sd 0:0:1:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 16 21 3c 00 00 0a 00
 mptscsih: ioc0: target reset: SUCCESS (sc=81006ea80b40)
 mptscsih: ioc0: attempting task abort! (sc=81006ea80b40)
 sd 0:0:1:0: [sdb] CDB: cdb[0]=0x0: 00 00 00 00 00 00
 mptbase: ioc0: LogInfo(0x3113): Originator={PL}, Code={IO Not Yet
 Executed}, SubCode(0x)
 mptscsih: ioc0: task abort: SUCCESS (sc=81006ea80b40)
 mptscsih: ioc0: attempting bus reset! (sc=81006ea80b40)
 sd 0:0:1:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 02 16 21 3c 00 00 0a 00
 mptscsih: ioc0: bus reset: SUCCESS (sc=81006ea80b40)
 mptscsih: ioc0: attempting task abort! (sc=81006ea80b40)
 sd 0:0:1:0: [sdb] CDB: cdb[0]=0x0: 00 00 00 00 00 00
 mptbase: ioc0: LogInfo(0x3113): Originator={PL}, Code={IO Not Yet
 Executed}, SubCode(0x)
 mptscsih: ioc0: task abort: SUCCESS (sc=81006ea80b40)
 mptscsih: ioc0: attempting task abort! (sc=81006ea806c0)
 sd 0:0:1:0: [sdb] CDB: cdb[0]=0x0: 00 00 00 00 00 00
 mptbase: ioc0: LogInfo(0x3113): Originator={PL}, Code={IO Not Yet
 Executed}, SubCode(0x)
 mptscsih: ioc0: task abort: SUCCESS (sc=81006ea806c0)
 mptscsih: ioc0: Attempting host reset! (sc=81006ea80b40)
 mptbase: Initiating ioc0 recovery
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 35004742
 raid10: Disk failure on sdb3, disabling device.
 Operation continuing on 3 devices
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 35004732
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 50036487
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 57635839
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 75634150
 raid5: Disk failure on sdb4, disabling device. Operation continuing on 3
 devices
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 82503262
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 83868190
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 105214542
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 131203934
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 161928270
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 161928398
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 175936606
 sd 0:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x00
 end_request: I/O error, dev sdb, sector 187551886
 sd 0:0:1:0: [sdb] Result

Re: mptsas related problem [2.6.22], LSI SAS1064ET PCI-E

2007-09-18 Thread Arkadiusz Miskiewicz
On Tuesday 18 of September 2007, Moore, Eric wrote:
 On  Monday, September 10, 2007 11:56 AM, Arkadiusz Miskiewicz wrote:
  SR2520SAXS platform (S5000VSA mainboard in 2U SR2520 chassis) with
 
  08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET
  PCI-Express Fusion-MPT SAS (rev 02)

 You have a 1064E B1 part. How much system memory do you have?

4GB on first and 1Gb on second.

  ps. If someone knows how FwRev=0110h relates to firmware
  version at
  http://downloadcenter.intel.com/filter_results.aspx?strTypes=a
  llProductID=2487OSFullName=OS+Independentlang=engstrOSs=38
  submit=Go%21
  then that's also interesting. Simply 01 10 ...== 1.16.0.0 ?

 1.16 firmware is Phase 7 firmware, which is over a year old.  Perhaps
 you could attempt upgrading by obtaining it from
 http://www.lsi.com/support/index.html or sending a request to
 [EMAIL PROTECTED]   Phase 11 just released.

The question is: is it safe to use lsi.com firmware for Intel SR2520SAXS 
platform where SAS1064ET resides on backplane (afaik) ? (There is no new 
firmware on intel website).

  sd 0:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 0a ad 28 de 00 00 10 00
  mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort},
  SubCode(0x0403)

 0x403 means FRAME_XFER_ERROR, which says your command packet didn't make
 it.

  mptbase: ioc0: LogInfo(0x3000): Originator={PL}, Code={Reset},
  SubCode(0x1000)

 0x1000 means SATA_INIT_TIMEOUT

 Can you obtain a SAS trace?

Probably, just tell me how to get sas trace.

-- 
Arkadiusz MiśkiewiczPLD/Linux Team
arekm / maven.plhttp://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mptsas related problem [2.6.22], LSI SAS1064ET PCI-E

2007-09-10 Thread Arkadiusz Miskiewicz

Hello,

SR2520SAXS platform (S5000VSA mainboard in 2U SR2520 chassis) with

08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET 
PCI-Express Fusion-MPT SAS (rev 02)

onboard (well afaik on backplane) and four SATA discs in various software 
raids level (1, 5 and 10) per partition.

Under bigger I/O load mptsas kicks out hdd disks like:
mptsas: ioc0: removing sata device, channel 0, id 24, phy 2

So far it dropped sdb twice and sdc once. Not sure if these are fauly 
harddrives or controller problems or driver problems. Any hints?

ps. If someone knows how FwRev=0110h relates to firmware version at
http://downloadcenter.intel.com/filter_results.aspx?strTypes=allProductID=2487OSFullName=OS+Independentlang=engstrOSs=38submit=Go%21
then that's also interesting. Simply 01 10 ...== 1.16.0.0 ?

mptbase: Initiating ioc0 bringup
ioc0: SAS1064E: Capabilities={Initiator}
PCI: Setting latency timer of device :08:00.0 to 64
scsi0 : ioc0: LSISAS1064E, FwRev=0110h, Ports=1, MaxQ=511, IRQ=16
Device driver host0 lacks bus and class support for being resumed.
Device driver phy-0:0 lacks bus and class support for being resumed.
Device driver port-0:0 lacks bus and class support for being resumed.
Device driver expander-0:0 lacks bus and class support for being resumed.
Device driver phy-0:1 lacks bus and class support for being resumed.
Device driver phy-0:2 lacks bus and class support for being resumed.
Device driver phy-0:3 lacks bus and class support for being resumed.
Device driver phy-0:0:4 lacks bus and class support for being resumed.
Device driver phy-0:0:5 lacks bus and class support for being resumed.
Device driver port-0:0:0 lacks bus and class support for being resumed.
Device driver end_device-0:0:0 lacks bus and class support for being resumed.
Device driver target0:0:0 lacks bus and class support for being resumed.
scsi 0:0:0:0: Direct-Access ATA  SAMSUNG HD321KJ  0-10 PQ: 0 ANSI: 5
Device driver phy-0:0:6 lacks bus and class support for being resumed.
Device driver port-0:0:1 lacks bus and class support for being resumed.
Device driver end_device-0:0:1 lacks bus and class support for being resumed.
Device driver target0:0:1 lacks bus and class support for being resumed.
scsi 0:0:1:0: Direct-Access ATA  SAMSUNG HD321KJ  0-10 PQ: 0 ANSI: 5
Device driver phy-0:0:7 lacks bus and class support for being resumed.
Device driver phy-0:0:8 lacks bus and class support for being resumed.
Device driver port-0:0:2 lacks bus and class support for being resumed.
Device driver end_device-0:0:2 lacks bus and class support for being resumed.
Device driver target0:0:2 lacks bus and class support for being resumed.
scsi 0:0:2:0: Direct-Access ATA  SAMSUNG HD321KJ  0-10 PQ: 0 ANSI: 5
Device driver phy-0:0:9 lacks bus and class support for being resumed.
Device driver port-0:0:3 lacks bus and class support for being resumed.
Device driver end_device-0:0:3 lacks bus and class support for being resumed.
Device driver target0:0:3 lacks bus and class support for being resumed.
scsi 0:0:3:0: Direct-Access ATA  SAMSUNG HD321KJ  0-10 PQ: 0 ANSI: 5
Device driver phy-0:0:10 lacks bus and class support for being resumed.
Device driver port-0:0:4 lacks bus and class support for being resumed.
Device driver phy-0:0:11 lacks bus and class support for being resumed.
Device driver phy-0:0:12 lacks bus and class support for being resumed.
Device driver phy-0:0:13 lacks bus and class support for being resumed.
Device driver phy-0:0:14 lacks bus and class support for being resumed.
Device driver port-0:0:5 lacks bus and class support for being resumed.
Device driver port-0:0:5 lacks bus and class support for being resumed.
Device driver end_device-0:0:5 lacks bus and class support for being resumed.
Device driver target0:0:4 lacks bus and class support for being resumed.
scsi 0:0:4:0: Enclosure ESG-SHV. SCA HSBP M13 2.04 PQ: 0 ANSI: 3
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 73 00 00 08
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 73 00 00 08
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:1:0: [sdb] Write Protect is off
sd 0:0:1:0: [sdb] Mode Sense: 73 00 00 08
sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:1:0: [sdb] Write Protect is off
sd 0:0:1:0: [sdb] Mode Sense: 73 00 00 08
sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sdb: sdb1 sdb2 sdb3 sdb4
sd 0:0:1:0: [sdb] Attached 

Re: 2.6.22 oops kernel BUG at block/elevator.c:366!

2007-08-30 Thread Arkadiusz Miskiewicz
On Wednesday 29 of August 2007, Jens Axboe wrote:
 On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
  On Wednesday 29 of August 2007, Jens Axboe wrote:
   On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
On Wednesday 29 of August 2007, Jens Axboe wrote:
 On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
  I guess I should sent these here since it looks like not scsi bug
  anyway.

 It's stex, right? It seems to have some issues with multiple
 completions of commands, which craps out the block layer of course.
   
Yes, stex. I'm staying with 2.6.19 in that case since it works fine
in that version.
   
So scsi bug ... 8-)
  
   And you based that conclusion on what exactly?
 
  Isn't drivers/scsi/* handled by [EMAIL PROTECTED] (that's what I mean)

 Yep indeed, I thought you meant that it was a scsi bug (and not an stex
 one). You could try and copy the 2.6.19 stex driver into 2.6.20 and see
 if that works, though.

Looks like this bug is known for months :-(

Ed Lin pointed to http://lkml.org/lkml/2007/1/23/268 with possible patch (that 
unfortunately serialises access to storage devices, well...)

There is also: http://bugzilla.kernel.org/show_bug.cgi?id=7842

I'm running 2.6.22 with that patch now, did huge (few hours) rsync that 
previously caused oopses and now everything works properly.

Can we get some form of this patch into Linus tree?

-- 
Arkadiusz MiśkiewiczPLD/Linux Team
arekm / maven.plhttp://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.22 oops kernel BUG at block/elevator.c:366!

2007-08-29 Thread Arkadiusz Miskiewicz
Hello,

I'm trying to get stable kernel for Promise SuperTrak 
X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed
like this (while doing rsync):

kernel BUG at block/elevator.c:366!
invalid opcode:  [1] SMP
CPU 1
Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs 
scsi_wait_scan sd_mod stex scsi_mod
Pid: 1139:#0, comm: xfsbufd Not tainted 2.6.22.5-0.2 #1
RIP: 0010:[8033f5da]  [8033f5da] elv_rb_del+0x3a/0x40
RSP: :8100759b1c00  EFLAGS: 00010046
RAX: 81000d1f5428 RBX: 81000d1f5428 RCX: 81007c1a1a00
RDX:  RSI: 81000d1f53b0 RDI: 81007c102af0
RBP: 81000d1f53b0 R08: 81004a9dab50 R09: 
R10:  R11: 880072c0 R12: 81007c102ac0
R13: 81007c1a1a00 R14: 0004 R15: 81007c102b18
FS:  2ba2cafc9be0() GS:81007d0a5b40() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2ba2cab5a158 CR3: 3c5ce000 CR4: 06e0
Process xfsbufd (pid: 1139[#0], threadinfo 8100759b, task 
81007cac1040)
Stack:  0001 81007c102ac0 81000d1f53b0 8034abe8
 0246 81000d1f53b0 81007c1a1a00 81007c102ac0
 81007c0f2d08 0004 81007c102b18 8034ad55
Call Trace:
 [8034abe8] cfq_remove_request+0x78/0x1b0
 [8034ad55] cfq_dispatch_insert+0x35/0x70
 [8034b61f] cfq_dispatch_requests+0x1bf/0x3a0
 [8033f11f] elv_next_request+0x3f/0x150
 [80243b04] lock_timer_base+0x34/0x70
 [88007329] :scsi_mod:scsi_request_fn+0x69/0x3d0
 [80343d46] __make_request+0xe6/0x5d0
 [8034158b] generic_make_request+0x18b/0x230
 [8034438a] submit_bio+0x5a/0xf0
 [8808d1e9] :xfs:_xfs_buf_ioapply+0x199/0x340
 [8808e099] :xfs:xfs_buf_iorequest+0x29/0x80
 [88092fbb] :xfs:xfs_bdstrat_cb+0x3b/0x50
 [8808e3c2] :xfs:xfsbufd+0x92/0x140
 [8808e330] :xfs:xfsbufd+0x0/0x140
 [8024fa3b] kthread+0x4b/0x80
 [8020b0a8] child_rip+0xa/0x12
 [8024f9f0] kthread+0x0/0x80
 [8020b09e] child_rip+0x0/0x12


Code: 0f 0b eb fe 66 90 48 83 ec 08 49 89 f8 48 89 f8 31 c9 eb 09
RIP  [8033f5da] elv_rb_del+0x3a/0x40
 RSP 8100759b1c00


I can reproduce it without bigger problem.


Here are the same oopses on 2.6.20:
http://paste.stgraber.org/3138

This is 1 x dual core athlon64 on asus m2npv mainboard, 2GB RAM.
There is hw raid on fasttrack 16350 only (no software one).

Has anyone seen this ?

Going to try without cfq.

-- 
Arkadiusz MiśkiewiczPLD/Linux Team
arekm / maven.plhttp://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22 oops kernel BUG at block/elevator.c:366!

2007-08-29 Thread Arkadiusz Miskiewicz
On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote:
 Hello,

 I'm trying to get stable kernel for Promise SuperTrak
 X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed
 like this (while doing rsync):

With anticipatory:

berta login: [ cut here ]
kernel BUG at block/as-iosched.c:1084!
invalid opcode:  [1] SMP
CPU 1
Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs 
scsi_wait_scan sd_mod stex scsi_mod
Pid: 32:#0, comm: kblockd/1 Not tainted 2.6.22.5-0.2 #1
RIP: 0010:[80349028]  [80349028] 
as_dispatch_request+0x438/0x460
RSP: 0018:81007d1fddc0  EFLAGS: 00010046
RAX:  RBX: 81007c765a00 RCX: 
RDX: 81007c765a28 RSI:  RDI: 81007c54ad08
RBP:  R08:  R09: 81006a289d80
R10:  R11: 0001 R12: 
R13: 0001 R14:  R15: 81007cf85048
FS:  2ba4421e8b00() GS:81007d0a5b40() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2ba46298f000 CR3: 50951000 CR4: 06e0
Process kblockd/1 (pid: 32[#0], threadinfo 81007d1fc000, task 
81007d1db040)
Stack:  81007c54ad08 81007cf85000 81007cf7e000 81007d1fde00
 81006a289cc0 8033f11f 0287 88000fa8
 81001646a6f8  81007cf85000 81007cf7e000
Call Trace:
 [8033f11f] elv_next_request+0x3f/0x150
 [88000fa8] :scsi_mod:scsi_dispatch_cmd+0x1c8/0x310
 [88007329] :scsi_mod:scsi_request_fn+0x69/0x3d0
 [80347b30] as_work_handler+0x0/0x50
 [80347b5c] as_work_handler+0x2c/0x50
 [8024b94c] run_workqueue+0xcc/0x170
 [8024c3a0] worker_thread+0x0/0x110
 [8024c3a0] worker_thread+0x0/0x110
 [8024c443] worker_thread+0xa3/0x110
 [8024fe10] autoremove_wake_function+0x0/0x30
 [8024c3a0] worker_thread+0x0/0x110
 [8024c3a0] worker_thread+0x0/0x110
 [8024fa3b] kthread+0x4b/0x80
 [8020b0a8] child_rip+0xa/0x12
 [8024f9f0] kthread+0x0/0x80
 [8020b09e] child_rip+0x0/0x12


Code: 0f 0b eb fe 0f 0b eb fe 31 ed c7 83 b8 00 00 00 01 00 00 00
RIP  [80349028] as_dispatch_request+0x438/0x460
 RSP 81007d1fddc0

-- 
Arkadiusz MiśkiewiczPLD/Linux Team
arekm / maven.plhttp://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22 oops kernel BUG at block/elevator.c:366!

2007-08-29 Thread Arkadiusz Miskiewicz
On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote:
 On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote:
  Hello,
 
  I'm trying to get stable kernel for Promise SuperTrak
  X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed
  like this (while doing rsync):

 With anticipatory:

 berta login: [ cut here ]
 kernel BUG at block/as-iosched.c:1084!

One more information: I'm currently running 2.6.19 for few hours and the oops 
doesn't happen. Looks like some regression introduced between 2.6.19 and 
2.6.20.

-- 
Arkadiusz MiśkiewiczPLD/Linux Team
arekm / maven.plhttp://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html