Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-20 Thread Travis Tabbal
 The latter, we run these VMs over NFS anyway and had
 ESXi boxes under test already. we were already
 separating data exports from VM exports. We use
 an in-house developed configuration management/bare
 metal system which allows us to install new machines
 pretty easily. In this case we just provisioned the
 ESXi VMs to  new VM exports on the Thor whilst
 re-using the data-exports as they were...


Thanks for the info. Unfortunately, I need this box to do double duty and run 
the VMs as well. The hardware is capable, this issue with XvM and/or the mpt 
driver just needs to get fixed. Other than that, things are running great with 
this server.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-19 Thread Jeroen Roodhart
How did your migration to ESXi go? Are you using it on the same hardware or 
did you just switch that server to an NFS server and run the VMs on another 
box?

The latter, we run these VMs over NFS anyway and had ESXi boxes under test 
already. we were already separating data exports from VM exports. We use an 
in-house developed configuration management/bare metal system which allows us 
to install new machines pretty easily. In this case we just provisioned the 
ESXi VMs to  new VM exports on the Thor whilst re-using the data-exports as 
they were...

Works pretty well, although the Sun x1027A 10G NICs aren't yet supported under 
ESXi 4...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-12 Thread Jeroen Roodhart
 I'm running nv126 XvM right now. I haven't tried it
 without XvM.

Without XvM we do not see these issues. We're running the VMs through NFS now 
(using ESXi)...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-12 Thread Travis Tabbal
  I'm running nv126 XvM right now. I haven't tried
 it
  without XvM.
 
 Without XvM we do not see these issues. We're running
 the VMs through NFS now (using ESXi)...

Interesting. It sounds like it might be an XvM specific bug. I'm glad I 
mentioned that in my bug report to Sun. Hopefully they can duplicate it. I'd 
like to stick with XvM as I've spent a fair amount of time getting things 
working well under it. 

How did your migration to ESXi go? Are you using it on the same hardware or did 
you just switch that server to an NFS server and run the VMs on another box?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-12 Thread James C. McPherson

Travis Tabbal wrote:

I'm running nv126 XvM right now. I haven't tried

it

without XvM.

Without XvM we do not see these issues. We're running
the VMs through NFS now (using ESXi)...


Interesting. It sounds like it might be an XvM specific bug. I'm glad I mentioned that in my bug report to Sun. Hopefully they can duplicate it. I'd like to stick with XvM as I've spent a fair amount of time getting things working well under it. 


How did your migration to ESXi go? Are you using it on the same hardware or did 
you just switch that server to an NFS server and run the VMs on another box?



Hi Travis,
your bug showed up - it's   6900767. Since bugs.opensolaris.org
isn't a live system, you won't be able to see it at

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6900767

until tomorrow.


cheers,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-03 Thread Jeroen Roodhart
We see the same issue on a x4540 Thor system with 500G disks:

lots of:
...
Nov  3 16:41:46 uva.nl scsi: [ID 107833 kern.warning] WARNING: 
/p...@3c,0/pci10de,3...@f/pci1000,1...@0 (mpt5):
Nov  3 16:41:46 encore.science.uva.nl   Disconnected command timeout for Target 
7
...

This system is running nv125 XvM. Seems to occur more when we are using vm-s. 
This of course causes very long interruptions on the vm-s as well...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-31 Thread Travis Tabbal
I am also running 2 of the Supermicro cards. I just upgraded to b126 and it 
seems improved. I am running a large file copy locally. I get these warnings in 
the dmesg log. When I do, I/O seems to stall for about 60sec. It comes back up 
fine, but it's very annoying. Any hints? I have 4 disks per controller right 
now, different brands, sizes, everything. New SATA fanout cables and no 
expanders. 

The drives on mpt0 and mpt1 are completely different, 4x400GB Seagate drives, 
4x1.5TB Samsung drives. I get the problem from both controllers. I didn't 
notice this till about b124. I can reproduce it with rsync copying files 
locally between ZFS filesystems and with --bwlimit=1 (10MB/sec). Keeping 
the limit low does seem to help. 

---

Oct 31 23:05:32 nas scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@3/pci15d9,a...@0 (mpt1):
Oct 31 23:05:32 nas Disconnected command timeout for Target 7
Oct 31 23:09:42 nas scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@2/pci15d9,a...@0 (mpt0):
Oct 31 23:09:42 nas Disconnected command timeout for Target 1
Oct 31 23:16:23 nas scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@2/pci15d9,a...@0 (mpt0):
Oct 31 23:16:23 nas Disconnected command timeout for Target 3
Oct 31 23:18:43 nas scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@3/pci15d9,a...@0 (mpt1):
Oct 31 23:18:43 nas Disconnected command timeout for Target 6
Oct 31 23:27:24 nas scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@3/pci15d9,a...@0 (mpt1):
Oct 31 23:27:24 nas Disconnected command timeout for Target 7
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-26 Thread David Turnbull
I'm having similar issues, with two AOC-USAS-L8i Supermicro 1068e  
cards mpt2 and mpt3, running 1.26.00.00IT

It seems to only affect a specific revision of disk. (???)

sd67  Soft Errors: 0 Hard Errors: 127 Transport Errors: 3416
Vendor: ATA  Product: WDC WD10EACS-00D Revision: 1A01 Serial No:
Size: 1000.20GB 1000204886016 bytes

sd58  Soft Errors: 0 Hard Errors: 83 Transport Errors: 2087
Vendor: ATA  Product: WDC WD10EACS-00D Revision: 1A01 Serial No:
Size: 1000.20GB 1000204886016 bytes

There are 8 other disks on the two controllers:
6xWDC WD10EACS-00Z Revision: 1B01 (no errors)
2xSAMSUNG HD103UJ  Revision: 1113 (no errors)

The two EACS-00D disks are in seperate enclosures with new SAS-SATA  
fanout cables.


Example error messages:

Oct 27 14:26:05 fleet scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/ 
pci1002,5...@2/pci15d9,a...@0 (mpt2):

Oct 27 14:26:05 fleet   wwn for target has changed

Oct 27 14:25:56 fleet scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/ 
pci1002,5...@3/pci15d9,a...@0 (mpt3):

Oct 27 14:25:56 fleet   wwn for target has changed

Oct 27 14:25:57 fleet scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/ 
pci1002,5...@2/pci15d9,a...@0 (mpt2):
Oct 27 14:25:57 fleet   mpt_handle_event_sync: IOCStatus=0x8000,  
IOCLogInfo=0x31110d00


Oct 27 14:25:48 fleet scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/ 
pci1002,5...@3/pci15d9,a...@0 (mpt3):
Oct 27 14:25:48 fleet   mpt_handle_event_sync: IOCStatus=0x8000,  
IOCLogInfo=0x31110d00


Oct 27 14:26:01 fleet scsi: [ID 365881 kern.info] /p...@0,0/ 
pci1002,5...@2/pci15d9,a...@0 (mpt2):

Oct 27 14:26:01 fleet   Log info 0x31110d00 received for target 1.
Oct 27 14:26:01 fleet   scsi_status=0x0, ioc_status=0x804b,  
scsi_state=0xc


Oct 27 14:25:51 fleet scsi: [ID 365881 kern.info] /p...@0,0/ 
pci1002,5...@3/pci15d9,a...@0 (mpt3):

Oct 27 14:25:51 fleet   Log info 0x31120403 received for target 2.
Oct 27 14:25:51 fleet   scsi_status=0x0, ioc_status=0x804b,  
scsi_state=0xc


On 22/10/2009, at 10:40 PM, Bruno Sousa wrote:


Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started  
to see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: / 
p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000,  
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: / 
p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000,  
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: / 
p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000,  
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: / 
p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000,  
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: / 
p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000,  
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the  
driver?,that now i have more information, where in the past such  
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this  
behaviour :



1 MPT Port found

   Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev   
IOC
1.  mpt0  LSI Logic SAS1068E B3 105   
011a 0


Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link 

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-25 Thread Adam Cheal
So, while we are working on resolving this issue with Sun, let me approach this 
from the another perspective: what kind of controller/drive ratio would be the 
minimum recommended to support a functional OpenSolaris-based archival 
solution? Given the following:

- the vast majority of IO to the system is going to be read oriented, other 
than the initial load of the archive shares and possibly scrubs/re-silvering 
in the case of failed drives
- we currently have one LSISAS3801E with two external ports; each port connects 
to one 23-disk JBOD
- Each JBOD has the ability to take in two external SAS connections if we 
enable the split-backplane option on it which would split the disk IO path 
between the two connectors (12 disks on one connector, 11 on the other); we do 
not currently have this enabled
- our current server platform only has 1 x PCIe-x8 slot available; we *could* 
look at changing this in the future, but I'd prefer to find a one-card solution 
if possible

Here is the math I did that shows the current IO situation (PLEASE correct this 
if I am mistaken, as I am somewhat winging it here and my head hurts) :

Based on info from:

http://storageadvisors.adaptec.com/2006/07/26/sas-drive-performance/
http://en.wikipedia.org/wiki/PCI_Express
http://support.wdc.com/product/kb.asp?modelno=WD1002FBYSx=9y=8

WD1002FBYS 1TB SATA2 7200rpm drive specs
Avg seek time = 8.9ms
Avg latency = 4.2ms
Max transfer speed = 112 MB/s
Avg transfer speed ~= 65 MB/s

Random IO scenario (theoretical numbers):
8.9ms avg seek time + 4.2ms avg latency = 13.1 ms avg access time
1/0.0131 = 76 IOPS/drive
22 (23 - 1 spare) drives x 76 IOPS/drive = 1672 IOPS/shelf
1672 IOPS/shelf x 2 = 3344 IOPS/controller
-or-
22 (23 - 1 spare) drives x 65 MB/s/drive = 1430 MB/s/shelf
1430 MB/s/shelf x 2 = 2860 MB/s controller

Pure streamed read IO scenario  (theoretical numbers):
0.0 avg seek time + 4.2ms avg latency = 4.2 ms avg access time
1/0.0042 = 238 IOPS/drive
22 (23 - 1 spare) drives x 238 IOPS/drive = 5236 IOPS/shelf
5236 IOPS/shelf x 2 = 10472 IOPS/controller
-or-
22 (23 - 1 spare) drives x 112 MB/s/drive = 2464 MB/s/shelf
2464 MB/s/shelf x 2 = 4928 MB/s controller

Max. bandwith of single SAS PHY interface = 270MB/s per port (300MB/s -
overhead)

LSISAS3801E has 2 x 4-port SAS connections. Each shelf gets a 4-port
connection, so:

Max controller bandwidth/shelf = 4 x 270 MB/s = 1080 MB/s
Max controller bandwidth = 2 x 1080 MB/s = 2160 MB/s

Max. bandwidth of PCIe x8 interface = 2GB/s
Typical sustained bandwidth of PCIe x8 interface (max - 5% overhead)=
1.9GB/s

Summary:

Current controller cannot handle max IO load of even random IO scenario
(1430 MB/s per shelf needed, controller can only handle 1080 MB/s per
shelf). Also, PCIe bus can't push more than 1.9 GB/s sustained over a
single slot, so we are limited by the single card.

Solution:

Connecting 2 x 4-port SAS connectors to one shelf (i.e. enabling split-mode) 
would get us 2160 MB/s
/ shelf. This would allow us to remove the controller as a bottleneck
for all but the extreme cached read scenario, but the PCIe bus would
still throttle us to 1.9 GB/s per slot. So, the controller could keep up
with the shelves, but the PCIe bus would have to wait sometimes which
may (?) be a healthier situation than overwhelming the controller.

To support two shelves per controller, we could use a LSISAS31601E (4 x 4-port 
SAS connectors) but we would hit the PCIe bus limitation again. Moving to two 
(or more?) separate PCIe-x8 cards would be best, but we require us to alter our 
server platform.

Whew. Thoughts? Comments? Suggestions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Markus Kovero
How do you estimate needed queue depth if one has say 64 to 128 disks sitting 
behind LSI?
Is it bad idea having queuedepth 1?

Yours
Markus Kovero


Lähettäjä: zfs-discuss-boun...@opensolaris.org 
[zfs-discuss-boun...@opensolaris.org] k#228;ytt#228;j#228;n Richard Elling 
[richard.ell...@gmail.com] puolesta
Lähetetty: 24. lokakuuta 2009 7:36
Vastaanottaja: Adam Cheal
Kopio: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile

ok, see below...

On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:

 Here is example of the pool config we use:

 # zpool status
  pool: pool002
 state: ONLINE
 scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52
 2009
 config:

NAME STATE READ WRITE CKSUM
pool002  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t18d0  ONLINE   0 0 0
c9t17d0  ONLINE   0 0 0
c9t55d0  ONLINE   0 0 0
c9t13d0  ONLINE   0 0 0
c9t15d0  ONLINE   0 0 0
c9t16d0  ONLINE   0 0 0
c9t11d0  ONLINE   0 0 0
c9t12d0  ONLINE   0 0 0
c9t14d0  ONLINE   0 0 0
c9t9d0   ONLINE   0 0 0
c9t8d0   ONLINE   0 0 0
c9t10d0  ONLINE   0 0 0
c9t29d0  ONLINE   0 0 0
c9t28d0  ONLINE   0 0 0
c9t27d0  ONLINE   0 0 0
c9t23d0  ONLINE   0 0 0
c9t25d0  ONLINE   0 0 0
c9t26d0  ONLINE   0 0 0
c9t21d0  ONLINE   0 0 0
c9t22d0  ONLINE   0 0 0
c9t24d0  ONLINE   0 0 0
c9t19d0  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t30d0  ONLINE   0 0 0
c9t31d0  ONLINE   0 0 0
c9t32d0  ONLINE   0 0 0
c9t33d0  ONLINE   0 0 0
c9t34d0  ONLINE   0 0 0
c9t35d0  ONLINE   0 0 0
c9t36d0  ONLINE   0 0 0
c9t37d0  ONLINE   0 0 0
c9t38d0  ONLINE   0 0 0
c9t39d0  ONLINE   0 0 0
c9t40d0  ONLINE   0 0 0
c9t41d0  ONLINE   0 0 0
c9t42d0  ONLINE   0 0 0
c9t44d0  ONLINE   0 0 0
c9t45d0  ONLINE   0 0 0
c9t46d0  ONLINE   0 0 0
c9t47d0  ONLINE   0 0 0
c9t48d0  ONLINE   0 0 0
c9t49d0  ONLINE   0 0 0
c9t50d0  ONLINE   0 0 0
c9t51d0  ONLINE   0 0 0
c9t52d0  ONLINE   0 0 0
cache
  c8t2d0 ONLINE   0 0 0
  c8t3d0 ONLINE   0 0 0
spares
  c9t20d0AVAIL
  c9t43d0AVAIL

 errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
 config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s0  ONLINE   0 0 0
c8t1d0s0  ONLINE   0 0 0

 errors: No known data errors

 ...and here is a snapshot of the system using iostat -indexC 5
 during a scrub of pool002 (c8 is onboard AHCI controller, c9 is
 LSI SAS 3801E):

  extended device statistics   
 errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w
 trn tot device
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t3d0
 8738.70.0 555346.10.0  0.1 345.00.0   39.5   0 3875
 0   1   1   2 c9

You see 345 entries in the active queue. If the controller rolls over at
511 active entries, then it would explain why it would soon begin to
have difficulty.

Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite
respectable.

  194.80.0 11936.90.0  0.0  7.90.0   40.3   0  87   0
 0   0   0 c9t8d0

These disks are doing almost 200 read IOPS, but are not 100% busy.
Average I/O size is 66 KB, which is not bad, lots of little I/Os could
be
worse, but at only 11.9 MB/s, you are not near the media bandwidth.
Average service time is 40.3 milliseconds, which

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Adam Cheal
The iostat I posted previously was from a system we had already tuned the 
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in 
actv per disk).

I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat 
output showed busier disks (%b is higher, which seemed odd) but a cap of about 
7 queue items per disk, proving the tuning was effective. iostat at a 
high-water mark during the test looked like this:

extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t3d0
 8344.50.0 359640.40.0  0.1 300.50.0   36.0   0 4362 c9
  190.00.0 6800.40.0  0.0  6.60.0   34.8   0  99 c9t8d0
  185.00.0 6917.10.0  0.0  6.10.0   32.9   0  94 c9t9d0
  187.00.0 6640.90.0  0.0  6.50.0   34.6   0  98 c9t10d0
  186.50.0 6543.40.0  0.0  7.00.0   37.5   0 100 c9t11d0
  180.50.0 7203.10.0  0.0  6.70.0   37.2   0 100 c9t12d0
  195.50.0 7352.40.0  0.0  7.00.0   35.8   0 100 c9t13d0
  188.00.0 6884.90.0  0.0  6.60.0   35.2   0  99 c9t14d0
  204.00.0 6990.10.0  0.0  7.00.0   34.3   0 100 c9t15d0
  199.00.0 7336.70.0  0.0  7.00.0   35.2   0 100 c9t16d0
  180.50.0 6837.90.0  0.0  7.00.0   38.8   0 100 c9t17d0
  198.00.0 7668.90.0  0.0  7.00.0   35.3   0 100 c9t18d0
  203.00.0 7983.20.0  0.0  7.00.0   34.5   0 100 c9t19d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t20d0
  195.50.0 7096.40.0  0.0  6.70.0   34.1   0  98 c9t21d0
  189.50.0 7757.20.0  0.0  6.40.0   33.9   0  97 c9t22d0
  195.50.0 7645.90.0  0.0  6.60.0   33.8   0  99 c9t23d0
  194.50.0 7925.90.0  0.0  7.00.0   36.0   0 100 c9t24d0
  188.50.0 6725.60.0  0.0  6.20.0   32.8   0  94 c9t25d0
  188.50.0 7199.60.0  0.0  6.50.0   34.6   0  98 c9t26d0
  196.00.0 .90.0  0.0  6.30.0   32.1   0  95 c9t27d0
  193.50.0 7455.40.0  0.0  6.20.0   32.0   0  95 c9t28d0
  189.00.0 7400.90.0  0.0  6.30.0   33.2   0  96 c9t29d0
  182.50.0 9397.00.0  0.0  7.00.0   38.3   0 100 c9t30d0
  192.50.0 9179.50.0  0.0  7.00.0   36.3   0 100 c9t31d0
  189.50.0 9431.80.0  0.0  7.00.0   36.9   0 100 c9t32d0
  187.50.0 9082.00.0  0.0  7.00.0   37.3   0 100 c9t33d0
  188.50.0 9368.80.0  0.0  7.00.0   37.1   0 100 c9t34d0
  180.50.0 9332.80.0  0.0  7.00.0   38.8   0 100 c9t35d0
  183.00.0 9690.30.0  0.0  7.00.0   38.2   0 100 c9t36d0
  186.00.0 9193.80.0  0.0  7.00.0   37.6   0 100 c9t37d0
  180.50.0 8233.40.0  0.0  7.00.0   38.8   0 100 c9t38d0
  175.50.0 9085.20.0  0.0  7.00.0   39.9   0 100 c9t39d0
  177.00.0 9340.00.0  0.0  7.00.0   39.5   0 100 c9t40d0
  175.50.0 8831.00.0  0.0  7.00.0   39.9   0 100 c9t41d0
  190.50.0 9177.80.0  0.0  7.00.0   36.7   0 100 c9t42d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t43d0
  196.00.0 9180.50.0  0.0  7.00.0   35.7   0 100 c9t44d0
  193.50.0 9496.80.0  0.0  7.00.0   36.2   0 100 c9t45d0
  187.00.0 8699.50.0  0.0  7.00.0   37.4   0 100 c9t46d0
  198.50.0 9277.00.0  0.0  7.00.0   35.2   0 100 c9t47d0
  185.50.0 9778.30.0  0.0  7.00.0   37.7   0 100 c9t48d0
  192.00.0 8384.20.0  0.0  7.00.0   36.4   0 100 c9t49d0
  198.50.0 8864.70.0  0.0  7.00.0   35.2   0 100 c9t50d0
  192.00.0 9369.80.0  0.0  7.00.0   36.4   0 100 c9t51d0
  182.50.0 8825.70.0  0.0  7.00.0   38.3   0 100 c9t52d0
  202.00.0 7387.90.0  0.0  7.00.0   34.6   0 100 c9t55d0

...and sure enough about 20 minutes into it I get this (bus reset?):

scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@21,0 (sd30):
   incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@1e,0 (sd27):
   incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0: IOC Operational.

During the bus reset, iostat output 

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Markus Kovero
We actually hit similar issues with LSI, but within workload not scrub, result 
is same but it seems to choke on writes rather than reads with suboptimal 
performance.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6891413

Anyway, we haven't experienced this _at all_ with RE3-version of Western 
Digital disks..
Issues seem to pop up with 750GB seagate and 1TB WD black-series, so far 2TB 
green WDs seem unaffected too, so might it be related to disks firmware due how 
they chat with LSI?

Also, we noticed more severe (even RE3 and 2TBWD green) timeouts if disks are 
not forced into SATA1-mode, I believe this is known issue with newer 2TB disks 
and some other disk controllers and may be caused by bad cabling or 
connectivity.

We have never witnessed this behaviour with SAS (fujitsu,ibm..) also. All this 
happens with snv 118,122,123 and 125.

Yours
Markus Kovero


Lähettäjä: zfs-discuss-boun...@opensolaris.org 
[zfs-discuss-boun...@opensolaris.org] k#228;ytt#228;j#228;n Adam Cheal 
[ach...@pnimedia.com] puolesta
Lähetetty: 24. lokakuuta 2009 12:49
Vastaanottaja: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile

The iostat I posted previously was from a system we had already tuned the 
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in 
actv per disk).

I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat 
output showed busier disks (%b is higher, which seemed odd) but a cap of about 
7 queue items per disk, proving the tuning was effective. iostat at a 
high-water mark during the test looked like this:

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t3d0
 8344.50.0 359640.40.0  0.1 300.50.0   36.0   0 4362 c9
  190.00.0 6800.40.0  0.0  6.60.0   34.8   0  99 c9t8d0
  185.00.0 6917.10.0  0.0  6.10.0   32.9   0  94 c9t9d0
  187.00.0 6640.90.0  0.0  6.50.0   34.6   0  98 c9t10d0
  186.50.0 6543.40.0  0.0  7.00.0   37.5   0 100 c9t11d0
  180.50.0 7203.10.0  0.0  6.70.0   37.2   0 100 c9t12d0
  195.50.0 7352.40.0  0.0  7.00.0   35.8   0 100 c9t13d0
  188.00.0 6884.90.0  0.0  6.60.0   35.2   0  99 c9t14d0
  204.00.0 6990.10.0  0.0  7.00.0   34.3   0 100 c9t15d0
  199.00.0 7336.70.0  0.0  7.00.0   35.2   0 100 c9t16d0
  180.50.0 6837.90.0  0.0  7.00.0   38.8   0 100 c9t17d0
  198.00.0 7668.90.0  0.0  7.00.0   35.3   0 100 c9t18d0
  203.00.0 7983.20.0  0.0  7.00.0   34.5   0 100 c9t19d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t20d0
  195.50.0 7096.40.0  0.0  6.70.0   34.1   0  98 c9t21d0
  189.50.0 7757.20.0  0.0  6.40.0   33.9   0  97 c9t22d0
  195.50.0 7645.90.0  0.0  6.60.0   33.8   0  99 c9t23d0
  194.50.0 7925.90.0  0.0  7.00.0   36.0   0 100 c9t24d0
  188.50.0 6725.60.0  0.0  6.20.0   32.8   0  94 c9t25d0
  188.50.0 7199.60.0  0.0  6.50.0   34.6   0  98 c9t26d0
  196.00.0 .90.0  0.0  6.30.0   32.1   0  95 c9t27d0
  193.50.0 7455.40.0  0.0  6.20.0   32.0   0  95 c9t28d0
  189.00.0 7400.90.0  0.0  6.30.0   33.2   0  96 c9t29d0
  182.50.0 9397.00.0  0.0  7.00.0   38.3   0 100 c9t30d0
  192.50.0 9179.50.0  0.0  7.00.0   36.3   0 100 c9t31d0
  189.50.0 9431.80.0  0.0  7.00.0   36.9   0 100 c9t32d0
  187.50.0 9082.00.0  0.0  7.00.0   37.3   0 100 c9t33d0
  188.50.0 9368.80.0  0.0  7.00.0   37.1   0 100 c9t34d0
  180.50.0 9332.80.0  0.0  7.00.0   38.8   0 100 c9t35d0
  183.00.0 9690.30.0  0.0  7.00.0   38.2   0 100 c9t36d0
  186.00.0 9193.80.0  0.0  7.00.0   37.6   0 100 c9t37d0
  180.50.0 8233.40.0  0.0  7.00.0   38.8   0 100 c9t38d0
  175.50.0 9085.20.0  0.0  7.00.0   39.9   0 100 c9t39d0
  177.00.0 9340.00.0  0.0  7.00.0   39.5   0 100 c9t40d0
  175.50.0 8831.00.0  0.0  7.00.0   39.9   0 100 c9t41d0
  190.50.0 9177.80.0  0.0  7.00.0   36.7   0 100 c9t42d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t43d0
  196.00.0 9180.50.0  0.0  7.00.0   35.7   0 100 c9t44d0
  193.50.0 9496.80.0  0.0  7.00.0   36.2   0 100 c9t45d0
  187.00.0 8699.50.0  0.0  7.00.0   37.4   0 100 c9t46d0
  198.50.0 9277.00.0  0.0  7.00.0   35.2   0 100 c9t47d0
  185.50.0 9778.30.0  0.0  7.00.0

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Tim Cook
On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal ach...@pnimedia.com wrote:

 The iostat I posted previously was from a system we had already tuned the
 zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
 in actv per disk).

 I reset this value in /etc/system to 7, rebooted, and started a scrub.
 iostat output showed busier disks (%b is higher, which seemed odd) but a cap
 of about 7 queue items per disk, proving the tuning was effective. iostat at
 a high-water mark during the test looked like this:



 ...and sure enough about 20 minutes into it I get this (bus reset?):

 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@21,0 (sd30):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@1e,0 (sd27):
   incomplete read- retrying
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Rev. 8 LSI, Inc. 1068E found.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0 supports power management.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0: IOC Operational.

 During the bus reset, iostat output looked like this:


 During our previous testing, we had tried even setting this max_pending
 value down to 1, but we still hit the problem (albeit it took a little
 longer to hit it) and I couldn't find anything else I could set to throttle
 IO to the disk, hence the frustration.

 If you hadn't seen this output, would you say that 7 was a reasonable
 value for that max_pending queue for our architecture and should give the
 LSI controller in this situation enough breathing room to operate? If so, I
 *should* be able to scrub the disks successfully (ZFS isn't to blame) and
 therefore have to point the finger at the
 mpt-driver/LSI-firmware/disk-firmware instead.
 --


A little bit of searching google says:
http://downloadmirror.intel.com/17968/eng/ESRT2_IR_readme.txt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Tim Cook
On Sat, Oct 24, 2009 at 11:20 AM, Tim Cook t...@cook.ms wrote:



 On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal ach...@pnimedia.com wrote:

 The iostat I posted previously was from a system we had already tuned the
 zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
 in actv per disk).

 I reset this value in /etc/system to 7, rebooted, and started a scrub.
 iostat output showed busier disks (%b is higher, which seemed odd) but a cap
 of about 7 queue items per disk, proving the tuning was effective. iostat at
 a high-water mark during the test looked like this:



 ...and sure enough about 20 minutes into it I get this (bus reset?):


 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@21,0 (sd30):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@1e,0 (sd27):
   incomplete read- retrying
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0(mpt0):
   Rev. 8 LSI, Inc. 1068E found.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0(mpt0):
   mpt0 supports power management.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0(mpt0):
   mpt0: IOC Operational.

 During the bus reset, iostat output looked like this:


 During our previous testing, we had tried even setting this max_pending
 value down to 1, but we still hit the problem (albeit it took a little
 longer to hit it) and I couldn't find anything else I could set to throttle
 IO to the disk, hence the frustration.

 If you hadn't seen this output, would you say that 7 was a reasonable
 value for that max_pending queue for our architecture and should give the
 LSI controller in this situation enough breathing room to operate? If so, I
 *should* be able to scrub the disks successfully (ZFS isn't to blame) and
 therefore have to point the finger at the
 mpt-driver/LSI-firmware/disk-firmware instead.
 --


 A little bit of searching google says:
 http://downloadmirror.intel.com/17968/eng/ESRT2_IR_readme.txt


Huh, good old keyboard shortcuts firing off emails before I'm done with
them.  Anyways, in that link, I found he following:
 3. Updated - to provide NCQ queue depth of 32 (was 8) on 1064e and 1068e
and 1078 internal-only controllers in IR and ESRT2 modes.

Then there's also this link from someone using a similar controller under
freebsd:
http://www.nabble.com/mpt-errors-QUEUE-FULL-EVENT,-freebsd-7.0-on-dell-1950-td20019090.html

It would make total sense if you're having issues and the default queue
depth for that controller is 8 per port.  Even setting it to 1 isn't going
to fix your issue if you've got 46 drives on one channel/port.

Honestly I'm just taking shots in the dark though.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Richard Elling

more below...

On Oct 24, 2009, at 2:49 AM, Adam Cheal wrote:

The iostat I posted previously was from a system we had already  
tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by  
the max of about 10 in actv per disk).


I reset this value in /etc/system to 7, rebooted, and started a  
scrub. iostat output showed busier disks (%b is higher, which seemed  
odd) but a cap of about 7 queue items per disk, proving the tuning  
was effective. iostat at a high-water mark during the test looked  
like this:


   extended device statistics
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t1d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t2d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t3d0
8344.50.0 359640.40.0  0.1 300.50.0   36.0   0 4362 c9
 190.00.0 6800.40.0  0.0  6.60.0   34.8   0  99 c9t8d0
 185.00.0 6917.10.0  0.0  6.10.0   32.9   0  94 c9t9d0
 187.00.0 6640.90.0  0.0  6.50.0   34.6   0  98 c9t10d0
 186.50.0 6543.40.0  0.0  7.00.0   37.5   0 100 c9t11d0
 180.50.0 7203.10.0  0.0  6.70.0   37.2   0 100 c9t12d0
 195.50.0 7352.40.0  0.0  7.00.0   35.8   0 100 c9t13d0
 188.00.0 6884.90.0  0.0  6.60.0   35.2   0  99 c9t14d0
 204.00.0 6990.10.0  0.0  7.00.0   34.3   0 100 c9t15d0
 199.00.0 7336.70.0  0.0  7.00.0   35.2   0 100 c9t16d0
 180.50.0 6837.90.0  0.0  7.00.0   38.8   0 100 c9t17d0
 198.00.0 7668.90.0  0.0  7.00.0   35.3   0 100 c9t18d0
 203.00.0 7983.20.0  0.0  7.00.0   34.5   0 100 c9t19d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c9t20d0
 195.50.0 7096.40.0  0.0  6.70.0   34.1   0  98 c9t21d0
 189.50.0 7757.20.0  0.0  6.40.0   33.9   0  97 c9t22d0
 195.50.0 7645.90.0  0.0  6.60.0   33.8   0  99 c9t23d0
 194.50.0 7925.90.0  0.0  7.00.0   36.0   0 100 c9t24d0
 188.50.0 6725.60.0  0.0  6.20.0   32.8   0  94 c9t25d0
 188.50.0 7199.60.0  0.0  6.50.0   34.6   0  98 c9t26d0
 196.00.0 .90.0  0.0  6.30.0   32.1   0  95 c9t27d0
 193.50.0 7455.40.0  0.0  6.20.0   32.0   0  95 c9t28d0
 189.00.0 7400.90.0  0.0  6.30.0   33.2   0  96 c9t29d0
 182.50.0 9397.00.0  0.0  7.00.0   38.3   0 100 c9t30d0
 192.50.0 9179.50.0  0.0  7.00.0   36.3   0 100 c9t31d0
 189.50.0 9431.80.0  0.0  7.00.0   36.9   0 100 c9t32d0
 187.50.0 9082.00.0  0.0  7.00.0   37.3   0 100 c9t33d0
 188.50.0 9368.80.0  0.0  7.00.0   37.1   0 100 c9t34d0
 180.50.0 9332.80.0  0.0  7.00.0   38.8   0 100 c9t35d0
 183.00.0 9690.30.0  0.0  7.00.0   38.2   0 100 c9t36d0
 186.00.0 9193.80.0  0.0  7.00.0   37.6   0 100 c9t37d0
 180.50.0 8233.40.0  0.0  7.00.0   38.8   0 100 c9t38d0
 175.50.0 9085.20.0  0.0  7.00.0   39.9   0 100 c9t39d0
 177.00.0 9340.00.0  0.0  7.00.0   39.5   0 100 c9t40d0
 175.50.0 8831.00.0  0.0  7.00.0   39.9   0 100 c9t41d0
 190.50.0 9177.80.0  0.0  7.00.0   36.7   0 100 c9t42d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c9t43d0
 196.00.0 9180.50.0  0.0  7.00.0   35.7   0 100 c9t44d0
 193.50.0 9496.80.0  0.0  7.00.0   36.2   0 100 c9t45d0
 187.00.0 8699.50.0  0.0  7.00.0   37.4   0 100 c9t46d0
 198.50.0 9277.00.0  0.0  7.00.0   35.2   0 100 c9t47d0
 185.50.0 9778.30.0  0.0  7.00.0   37.7   0 100 c9t48d0
 192.00.0 8384.20.0  0.0  7.00.0   36.4   0 100 c9t49d0
 198.50.0 8864.70.0  0.0  7.00.0   35.2   0 100 c9t50d0
 192.00.0 9369.80.0  0.0  7.00.0   36.4   0 100 c9t51d0
 182.50.0 8825.70.0  0.0  7.00.0   38.3   0 100 c9t52d0
 202.00.0 7387.90.0  0.0  7.00.0   34.6   0 100 c9t55d0

...and sure enough about 20 minutes into it I get this (bus reset?):

scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ 
pci1000,3...@0/s...@34,0 (sd49):

  incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ 
pci1000,3...@0/s...@21,0 (sd30):

  incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ 
pci1000,3...@0/s...@1e,0 (sd27):

  incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0  
(mpt0):

  Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0  
(mpt0):

  mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0  
(mpt0):

  mpt0: IOC Operational.

During the bus reset, 

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Carson Gaspar

On 10/24/09 9:43 AM, Richard Elling wrote:


OK, here we see 4 I/Os pending outside of the host. The host has
sent them on and is waiting for them to return. This means they are
getting dropped either at the disk or somewhere between the disk
and the controller.

When this happens, the sd driver will time them out, try to clear
the fault by reset, and retry. In other words, the resets you see
are when the system tries to recover.

Since there are many disks with 4 stuck I/Os, I would lean towards
a common cause. What do these disks have in common? Firmware?
Do they share a SAS expander?


I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware 
1.28.02.00 in IT mode, but I (almost?) always had exactly 1 stuck I/O. Note 
that my disks were one per channel, no expanders. I have _not_ seen it since 
replacing those disks. So my money is on a bug in the LSI firmware, the drive 
firmware, the drive controller hardware, or some combination thereof.


Note that LSI has released firmware 1.29.00.00. Sadly I cannot find any 
documentation on what has changed. Downloadable from LSI at 
http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html?remote=1locale=EN


--
Carson




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Tim Cook
On Sat, Oct 24, 2009 at 12:30 PM, Carson Gaspar car...@taltos.org wrote:


 I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware
 1.28.02.00 in IT mode, but I (almost?) always had exactly 1 stuck I/O.
 Note that my disks were one per channel, no expanders. I have _not_ seen it
 since replacing those disks. So my money is on a bug in the LSI firmware,
 the drive firmware, the drive controller hardware, or some combination
 thereof.

 Note that LSI has released firmware 1.29.00.00. Sadly I cannot find any
 documentation on what has changed. Downloadable from LSI at
 http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html?remote=1locale=EN

 --
 Carson


Here's the closest I could find from some Intel release notes.  It came
from: ESRT2_IR_readme.txt and does mention the 1068e chipset, as well as
that firmware rev.



Package Information

FW and OpROM Package for Native SAS mode, IT/IR mode and Intel(R) Embedded
Server RAID Technology II

Package version: 2009.10.06
FW Version = 01.29.00 (includes fixed firmware settings)
BIOS (non-RAID) Version = 06.28.00
BIOS (SW RAID) Version = 08.09041155

Supported RAID modes: 0, 1, 1E, 10, 10E and 5 (activation key AXXRAKSW5
required for RAID 5 support)

Supported Intel(R) Server Boards and Systems:
 - S5000PSLSASR, S5000XVNSASR, S5000VSASASR, S5000VCLSASR, S5000VSFSASR
 - SR1500ALSASR, SR1550ALSASR, SR2500ALLXR, S5000PALR (with SAS I/O Module)
 - S5000PSLROMBR (SROMBSAS18E) without HW RAID activation key AXXRAK18E
installed (native SAS or SW RAID modes only) - for HW RAID mode separate
package is available
 - NSC2U, TIGW1U

Supported Intel(R) RAID controller (adapters):
- SASMF8I, SASWT4I, SASUC8I

Intel(R) SAS Entry RAID Module AXX4SASMOD, when inserted in below Intel(R)
Server Boards and Systems:
 - S5520HC / S5520HCV, S5520SC,S5520UR,S5500WB


Known Restrictions

1. The sasflash versions within this package don't support ESRTII
controllers.
2. The sasflash utility for Windows and Linux version within this package
only support Intel(R) IT/IR RAID controllers.  The sasflash utility for
Windows and Linux version within this package don't support sasflash -o -e 6
command.
3. The sasflash utility for DOS version doesn't support the Intel(R) Server
Boards and Systems due to BIOS limitation.  The DOS version sasflash might
still be supported on 3rd party server boards which don't have the BIOS
limitation.
4. No PCI 3.0 support
5. No Foreign Configuration Resolution Support
6. No RAID migration Support
7. No mixed RAID mode support ever
8. No Stop On Error support


Known Bugs

(1)
For Intel(R) chipset S5000P/S5000V/S5000X based server systems, please use
the 32 bit, non-EBC version of sasflash which is
SASFLASH_Ph17-1.22.00.00\sasflash_efi_bios32_rel\sasflash.efi, instead of
the ebc version of sasflash which is in the top package directory and also
in
SASFLASH_Ph17-1.22.00.00\sasflash_efi_ebc_rel\sasflash.efi.  The latter one
may return a wrong sas address with a sasflash -list command in the listed
systems.

(2)
LED behavior does not match between SES and SGPIO for some conditions
(documentation in process).

(3)
When in EFI Optimized Boot mode, the task bar is not displayed in EFI_BSD
after two volumes are created.

(4)
If a system is rebooted while a volume rebuild is in progress, the rebuild
will start over from the beginning.


Fixes/Updates

Version 2009.10.06
 1. Fixed - MP2 HDD fault LED stays on after rebuild completes
 2. Fixed - System hangs if drive hot-unplugged during stress

Version 2009.07.30
 1. Fixed - SES over i2c for 106x products
 2. Fixed - FW settings updated to support SES over i2c drive lights on
FALSASMP2.

Version 2009.06.15
 1. Fixed - SES over I2C issue for 1078IR.
 2. Updated - 1068e fw to fix SES over I2C on MP2 bug.
 3. Updated - to provide NCQ queue depth of 32 (was 8) on 1064e and 1068e
and 1078 internal-only controllers in IR and ESRT2 modes.
 4. Updated - Firmware to enable SES over I2C on AXX4SASMOD.
 5. Updated - Settings to provide better LED indicators for SGPIO.

Version 2008.12.11
 1. Fixed - Media can't boot from SATA DVD in some systems in Software RAID
(ESRT2) mode.
 2. Fixed - Incorrect RAID 5 ECC error handling in Ctrl+M

Version 2008.11.07
 1. Added support for - Enable ICH10 support
 2. Added support for - Software RAID5 to support ICH10R
 3. Added support for - Single Drive RAID 0 (IS) Volume
 4. Fixed - Resolved issue where user could not create a second volume
immediately following the deletion of a second volume.
 5. Fixed - Second hot spare status not shown when first hot spare is
inactive/missing

Version 2008.09.22
 1. Fixed - SWR:During hot PD removal and then quick reboot, not updating
the DDF correctly.

Version 2008.06.16
 1. Fixed - the issue withThe LED functions are not working inside the
OSes for SWR5
 2. Fixed - 

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Adam Cheal
The controller connects to two disk shelves (expanders), one per port on the 
card. If you look back in the thread, you'll see our zpool config has one vdev 
per shelf. All of the disks are Western Digital (model WD1002FBYS-18A6B0) 1TB 
7.2K, firmware rev. 03.00C06. Without actually matching up the disks with 
stuck IOs, I am assuming they are all on the same vdev/shelf/controller port.

I communicated with LSI support directly regarding the v1.29 firmware update, 
and here's what they wrote back:

I have checked with our development team on this one. There are no release 
notes available as the functionality of the coding itself has not changed. This 
was a minor cleanup and the firmware was assigned a new phase number for these. 
There were no defects or added functionality in going from the P16 firmware to 
the P17 firmware.

Also, regarding the NCQ depth on the drives I used the LSIUTIL in expert mode 
and used options 13/14 to dump the following settings (which are all default):

Multi-pathing:  [0=Disabled, 1=Enabled, default is 0] 
SATA Native Command Queuing:  [0=Disabled, 1=Enabled, default is 1] 
SATA Write Caching:  [0=Disabled, 1=Enabled, default is 1] 
SATA Maximum Queue Depth:  [0 to 255, default is 32] 
Device Missing Report Delay:  [0 to 2047, default is 0] 
Device Missing I/O Delay:  [0 to 255, default is 0] 
Persistence:  [0=Disabled, 1=Enabled, default is 1] 
Physical mapping:  [0=None, 1=DirectAttach, 2=EnclosureSlot, default is 0]
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa

Hi Cindy,

I have a couple of questions about this issue :

  1. i have exactly the same LSI controller in another server running
 opensolaris snv_101b, and so far no errors like this ones where
 seen in the system
  2. up to snv_118 i haven't seen any problems, only now within snv_125
  3. the Sun StorageTek SAS HBA isn't a LSI OEM ? if so, is it possible
 to know what firmware version is that HBA using?


Thank you,
Bruno

Cindy Swearingen wrote:

Hi Bruno,

I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.

According to the 6694909 comments, this issue is documented in the
release notes.

As they are harmless, I wouldn't worry about them.

Maybe someone from the driver group can comment further.

Cindy


On 10/22/09 05:40, Bruno Sousa wrote:

Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started 
to see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the 
driver?,that now i have more information, where in the past such 
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this 
behaviour :



1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Down, No Errors

Adapter Phy 1:  Link Down, No Errors

Adapter Phy 2:  Link Down, No Errors

Adapter Phy 3:  Link Down, No Errors

Adapter Phy 4:  Link Up, No Errors

Adapter Phy 5:  Link Up, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up
 Invalid DWord Count  79,967,229
 Running Disparity Error Count63,036,893
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 1:  Link Up
 Invalid DWord Count  79,967,207
 Running Disparity Error Count78,339,626
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 2:  Link Up
 Invalid DWord Count  76,717,646
 Running Disparity Error Count73,334,563
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 3:  

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa

Hi Adam,

How many disks and zpoo/zfs's do you have behind that LSI?
I have a system with 22 disks and 4 zpools with around 30 zfs's and so 
far it works like a charm, even during heavy load. The opensolaris 
release is snv_101b .


Bruno
Adam Cheal wrote:

Cindy: How can I view the bug report you referenced? Standard methods show my 
the bug number is valid (6694909) but no content or notes. We are having 
similar messages appear with snv_118 with a busy LSI controller, especially 
during scrubbing, and I'd be interested to see what they mentioned in that 
report. Also, the LSI firmware updates for the LSISAS3081E (the controller we 
use) don't usually come with release notes indicating what has changed in each 
firmware revision, so I'm not sure where they got that idea from.
  




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
Our config is:
OpenSolaris snv_118 x64
1 x LSISAS3801E controller
2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
Each of the two external ports on the LSI connects to a 23-disk JBOD. ZFS-wise 
we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). Each zpool has 
one ZFS filesystem containing millions of files/directories. This data is 
served up via CIFS (kernel), which is why we went with snv_118 (first release 
post-2009.06 that had stable CIFS server). Like I mentioned to James, we know 
that the server won't be a star performance-wise especially because of the wide 
vdevs but it shouldn't hiccup under load either. A guaranteed way for us to 
cause these IO errors is to load up the zpool with about 30 TB of data (90% 
full) then scrub it. Within 30 minutes we start to see the errors, which 
usually evolves into failing disks (because of excessive retry errors) which 
just makes things worse.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Jeremy f
What bug# is this under? I'm having what I believe is the same problem. Is
it possible to just take the mpt driver from a prior build in the time
being?
The below is from the load the zpool scrub creates. This is on a dell t7400
workstation with a 1068E oemed lsi. I updated the firmware to the newest
available from dell. The errors follow whichever of the 4 drives has the
highest load.

Streaming doesn't seem to trigger it as I can push 60 MiB a second to a
mirrored rpool all day, it's only when there are a lot of metadata
operations.


Oct 23 06:25:44 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:25:44 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:27:15 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:27:15 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:28:26 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:28:26 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:29:47 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:29:47 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:30:58 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:30:58 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:28 systurbo5   mpt_handle_event_sync: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:28 systurbo5   mpt_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc


On Fri, Oct 23, 2009 at 7:13 AM, Adam Cheal ach...@pnimedia.com wrote:

 Our config is:
 OpenSolaris snv_118 x64
 1 x LSISAS3801E controller
 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
 Each of the two external ports on the LSI connects to a 23-disk JBOD.
 ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD).
 Each zpool has one ZFS filesystem containing millions of files/directories.
 This data is served up via CIFS (kernel), which is why we went with snv_118
 (first release post-2009.06 that had stable CIFS server). Like I mentioned
 to James, we know that the server won't be a star performance-wise
 especially because of the wide vdevs but it shouldn't hiccup under load
 either. A guaranteed way for us to cause these IO errors is to load up the
 zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes
 we start to see the errors, which usually evolves into failing disks
 (because of excessive retry errors) which just makes things worse.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Jeremy f
Sorry, running snv_123, indiana

On Fri, Oct 23, 2009 at 11:16 AM, Jeremy f rysh...@gmail.com wrote:

 What bug# is this under? I'm having what I believe is the same problem. Is
 it possible to just take the mpt driver from a prior build in the time
 being?
 The below is from the load the zpool scrub creates. This is on a dell t7400
 workstation with a 1068E oemed lsi. I updated the firmware to the newest
 available from dell. The errors follow whichever of the 4 drives has the
 highest load.

 Streaming doesn't seem to trigger it as I can push 60 MiB a second to a
 mirrored rpool all day, it's only when there are a lot of metadata
 operations.


 Oct 23 06:25:44 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:25:44 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:27:15 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:27:15 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:28:26 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:28:26 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:29:47 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:29:47 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:30:58 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:30:58 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:28 systurbo5   mpt_handle_event_sync: IOCStatus=0x8000,
 IOCLogInfo=0x31123000
 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:28 systurbo5   mpt_handle_event: IOCStatus=0x8000,
 IOCLogInfo=0x31123000
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc


 On Fri, Oct 23, 2009 at 7:13 AM, Adam Cheal ach...@pnimedia.com wrote:

 Our config is:
 OpenSolaris snv_118 x64
 1 x LSISAS3801E controller
 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
 Each of the two external ports on the LSI connects to a 23-disk JBOD.
 ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD).
 Each zpool has one ZFS filesystem containing millions of files/directories.
 This data is served up via CIFS (kernel), which is why we went with snv_118
 (first release post-2009.06 that had stable CIFS server). Like I mentioned
 to James, we know that the server won't be a star performance-wise
 especially because of the wide vdevs but it shouldn't hiccup under load
 either. A guaranteed way for us to cause these IO errors is to load up the
 zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes
 we start to see the errors, which usually evolves into failing disks
 (because of excessive retry errors) which just makes things worse.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
Just submitted the bug yesterday, under advice of James, so I don't have a 
number you can refer to you...the change request number is 6894775 if that 
helps or is directly related to the future bugid.

From what I seen/read this problem has been around for awhile but only rears 
its ugly head under heavy IO with large filesets, probably related to large 
metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear 
in snv_123 and snv_125 as well from what I read here.

We've tried installing SSD's to act as a read-cache for the pool to reduce the 
metadata hits on the physical disks and as a last-ditch effort we even tried 
switching to the latest LSI-supplied itmpt driver from 2007 (from reading 
http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and 
disabling the mpt driver but we ended up with the same timeout issues. In our 
case, the drives in the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k 
SATA drives.

In revisting our architecture, we compared it to Sun's x4540 Thumper offering 
which uses the same controller with similar (though apparently customized) 
firmware and 48 disks. The difference is that they use 6 x LSI1068e controllers 
which each have to deal with only 8 disks...obviously better on performance but 
this architecture could be hiding the real IO issue by distributing the IO 
across so many controllers.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread James C. McPherson

Adam Cheal wrote:

Just submitted the bug yesterday, under advice of James, so I don't have a number you can 
refer to you...the change request number is 6894775 if that helps or is 
directly related to the future bugid.


From what I seen/read this problem has been around for awhile but only rears 
its ugly head under heavy IO with large filesets, probably related to large 
metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear 
in snv_123 and snv_125 as well from what I read here.


We've tried installing SSD's to act as a read-cache for the pool to reduce the metadata 
hits on the physical disks and as a last-ditch effort we even tried switching to the 
latest LSI-supplied itmpt driver from 2007 (from reading 
http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling 
the mpt driver but we ended up with the same timeout issues. In our case, the drives in 
the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives.

In revisting our architecture, we compared it to Sun's x4540 Thumper offering which uses 
the same controller with similar (though apparently customized) firmware and 48 disks. 
The difference is that they use 6 x LSI1068e controllers which each have to deal with 
only 8 disks...obviously better on performance but this architecture could be 
hiding the real IO issue by distributing the IO across so many controllers.


Hi Adam,
I was watching the incoming queues all day yesterday for the
bug, but missed seeing it, not sure why.

I've now moved the bug to the appropriate category so it will
get attention from the right people.


Thanks,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of hidden 
problems found by Sun where the HBA resets, and due to market time 
pressure the quick and dirty solution was to spread the load over 
multiple HBA's instead of software fix?


Just my 2 cents..


Bruno


Adam Cheal wrote:

Just submitted the bug yesterday, under advice of James, so I don't have a number you can 
refer to you...the change request number is 6894775 if that helps or is 
directly related to the future bugid.

From what I seen/read this problem has been around for awhile but only rears 
its ugly head under heavy IO with large filesets, probably related to large 
metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear in 
snv_123 and snv_125 as well from what I read here.

We've tried installing SSD's to act as a read-cache for the pool to reduce the metadata 
hits on the physical disks and as a last-ditch effort we even tried switching to the 
latest LSI-supplied itmpt driver from 2007 (from reading 
http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling 
the mpt driver but we ended up with the same timeout issues. In our case, the drives in 
the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives.

In revisting our architecture, we compared it to Sun's x4540 Thumper offering which uses 
the same controller with similar (though apparently customized) firmware and 48 disks. 
The difference is that they use 6 x LSI1068e controllers which each have to deal with 
only 8 disks...obviously better on performance but this architecture could be 
hiding the real IO issue by distributing the IO across so many controllers.
  



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa

Hi Cindy,

Thank you for the update, mas it seems like i can't see any information 
specific to that bug.
I can only see bugs number 6702538 and 6615564, but according to their 
history, they have been fixed quite some time ago.

Can you by any chance present the information about bug 6694909 ?

Thank you,
Bruno


Cindy Swearingen wrote:

Hi Bruno,

I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.

According to the 6694909 comments, this issue is documented in the
release notes.

As they are harmless, I wouldn't worry about them.

Maybe someone from the driver group can comment further.

Cindy


On 10/22/09 05:40, Bruno Sousa wrote:

Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started 
to see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the 
driver?,that now i have more information, where in the past such 
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this 
behaviour :



1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Down, No Errors

Adapter Phy 1:  Link Down, No Errors

Adapter Phy 2:  Link Down, No Errors

Adapter Phy 3:  Link Down, No Errors

Adapter Phy 4:  Link Up, No Errors

Adapter Phy 5:  Link Up, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up
 Invalid DWord Count  79,967,229
 Running Disparity Error Count63,036,893
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 1:  Link Up
 Invalid DWord Count  79,967,207
 Running Disparity Error Count78,339,626
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 2:  Link Up
 Invalid DWord Count  76,717,646
 Running Disparity Error Count73,334,563
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 3:  Link Up
 Invalid DWord Count  79,896,409
 Running Disparity Error Count   

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 1:48 PM, Bruno Sousa wrote:
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of  
hidden problems found by Sun where the HBA resets, and due to  
market time pressure the quick and dirty solution was to spread  
the load over multiple HBA's instead of software fix?


I don't think so. X4540 has 48 disks -- 6 controllers at 8 disks/ 
controller.
This is the same configuration as the X4500, which used a Marvell  
controller.

This decision leverages parts from the previous design.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 3:48 PM, Bruno Sousa bso...@epinfante.com wrote:

 Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of hidden
 problems found by Sun where the HBA resets, and due to market time pressure
 the quick and dirty solution was to spread the load over multiple HBA's
 instead of software fix?

 Just my 2 cents..


 Bruno


What else were you expecting them to do?  According to LSI's website, the
1068e in an x8 configuration is an 8-port card.
http://www.lsi.com/DistributionSystem/AssetDocument/files/docs/marketing_docs/storage_stand_prod/SCG_LSISAS1068E_PB_040407.pdf

While they could've used expanders, that just creates one more component
that can fail/have issues.  Looking at the diagram, they've taken the
absolute shortest I/O path possible, which is what I would hope to
see/expect.
http://www.sun.com/servers/x64/x4540/server_architecture.pdf

One drive per channel, 6 channels total.

I also wouldn't be surprised to find out that they found this the optimal
configuration from a performance/throughput/IOPS perspective as well.  Can't
seem to find those numbers published by LSI.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
I don't think there was any intention on Sun's part to ignore the 
problem...obviously their target market wants a performance-oriented box and 
the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels 
= 1 channel per drive = no contention for channels. The x4540 is a monster and 
performs like a dream with snv_118 (we have a few ourselves).

My issue is that implementing an archival-type solution demands a dense, simple 
storage platform that performs at a reasonable level, nothing more. Our design 
has the same controller chip (8 SAS PHY channels) driving 46 disks, so there is 
bound to be contention there especially in high-load situations. I just need it 
to work and handle load gracefully, not timeout and cause disk failures; at 
this point I can't even scrub the zpools to verify the data we have on there is 
valid. From a hardware perspective, the 3801E card is spec'ed to handle our 
architecture; the OS just seems to fall over somewhere though and not be able 
to throttle itself in certain intensive IO situations.

That said, I don't know whether to point the finger at LSI's firmware or 
mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their 1068E 
is the recommended SAS controller chip and is used in their own products. At 
least we've got a bug filed now, and we can hopefully follow this through to 
find out where the system breaks down.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal ach...@pnimedia.com wrote:

 I don't think there was any intention on Sun's part to ignore the
 problem...obviously their target market wants a performance-oriented box and
 the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY
 channels = 1 channel per drive = no contention for channels. The x4540 is a
 monster and performs like a dream with snv_118 (we have a few ourselves).

 My issue is that implementing an archival-type solution demands a dense,
 simple storage platform that performs at a reasonable level, nothing more.
 Our design has the same controller chip (8 SAS PHY channels) driving 46
 disks, so there is bound to be contention there especially in high-load
 situations. I just need it to work and handle load gracefully, not timeout
 and cause disk failures; at this point I can't even scrub the zpools to
 verify the data we have on there is valid. From a hardware perspective, the
 3801E card is spec'ed to handle our architecture; the OS just seems to fall
 over somewhere though and not be able to throttle itself in certain
 intensive IO situations.

 That said, I don't know whether to point the finger at LSI's firmware or
 mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their
 1068E is the recommended SAS controller chip and is used in their own
 products. At least we've got a bug filed now, and we can hopefully follow
 this through to find out where the system breaks down.


Have you checked in with LSI to verify the IOPS ability of the chip?  Just
because it supports having 46 drives attached to one ASIC doesn't mean it
can actually service all 46 at once.  You're talking (VERY conservatively)
2800 IOPS.

Even ignoring that, I know for a fact that the chip can't handle raw
throughput numbers on 46 disks unless you've got some very severe raid
overhead.  That chip is good for roughly 2GB/sec each direction.  46 7200RPM
drives can fairly easily push 4x that amount in streaming IO loads.

Long story short, it appears you've got a 5lbs bag a 50lbs load...

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
LSI's sales literature on that card specs 128 devices which I take with a few 
hearty grains of salt. I agree that with all 46 drives pumping out streamed 
data, the controller would be overworked BUT the drives will only deliver data 
as fast as the OS tells them to. Just because the speedometer says 200 mph max 
doesn't mean we should (or even can!) go that fast.

The IO intensive operations that trigger our timeout issues are a small 
percentage of the actual normal IO we do to the box. Most of the time the 
solution happily serves up archived data, but when it comes time to scrub or do 
mass operations on the entire dataset bad things happen. It seems a waste to 
architect a more expensive performance-oriented solution when you aren't going 
to use that performance the majority of the time. There is a balance between 
performance and functionality, but I still feel that we should be able to make 
this situation work.

Ideally, the OS could dynamically adapt to slower storage and throttle its IO 
requests accordingly. At the least, it could allow the user to specify some IO 
thresholds so we can cage the beast if need be. We've tried some manual 
tuning via kernel parameters to restrict max queued operations per vdev and 
also a scrub related one (specifics escape me), but it still manages to 
overload itself.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 4:46 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal ach...@pnimedia.com  
wrote:
I don't think there was any intention on Sun's part to ignore the  
problem...obviously their target market wants a performance-oriented  
box and the x4540 delivers that. Each 1068E controller chip supports  
8 SAS PHY channels = 1 channel per drive = no contention for  
channels. The x4540 is a monster and performs like a dream with  
snv_118 (we have a few ourselves).


My issue is that implementing an archival-type solution demands a  
dense, simple storage platform that performs at a reasonable level,  
nothing more. Our design has the same controller chip (8 SAS PHY  
channels) driving 46 disks, so there is bound to be contention there  
especially in high-load situations. I just need it to work and  
handle load gracefully, not timeout and cause disk failures; at  
this point I can't even scrub the zpools to verify the data we have  
on there is valid. From a hardware perspective, the 3801E card is  
spec'ed to handle our architecture; the OS just seems to fall over  
somewhere though and not be able to throttle itself in certain  
intensive IO situations.


That said, I don't know whether to point the finger at LSI's  
firmware or mpt-driver/ZFS. Sun obviously has a good relationship  
with LSI as their 1068E is the recommended SAS controller chip and  
is used in their own products. At least we've got a bug filed now,  
and we can hopefully follow this through to find out where the  
system breaks down.



Have you checked in with LSI to verify the IOPS ability of the  
chip?  Just because it supports having 46 drives attached to one  
ASIC doesn't mean it can actually service all 46 at once.  You're  
talking (VERY conservatively) 2800 IOPS.


Tim has a valid point. By default, ZFS will queue 35 commands per disk.
For 46 disks that is 1,610 concurrent I/Os.  Historically, it has  
proven to be

relatively easy to crater performance or cause problems with very, very,
very expensive arrays that are easily overrun by Solaris. As a result,  
it is
not uncommon to see references to setting throttles, especially in  
older docs.


Fortunately, this is  simple to test by reducing the number of I/Os ZFS
will queue.  See the Evil Tuning Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

The mpt source is not open, so the mpt driver's reaction to 1,610  
concurrent
I/Os can only be guessed from afar -- public LSI docs mention a number  
of 511
concurrent I/Os for SAS1068, but it is not clear to me that is an  
explicit limit.  If

you have success with zfs_vdev_max_pending set to 10, then the mystery
might be solved. Use iostat to observe the wait and actv columns, which
show the number of transactions in the queues.  JCMP?

NB sometimes a driver will have the limit be configurable. For  
example, to get
high performance out of a high-end array attached to a qlc card, I've  
set
the execution-throttle in /kernel/drv/qlc.conf to be more than two  
orders of
magnitude greater than its default of 32. /kernel/drv/mpt*.conf does  
not seem

to have a similar throttle.
 -- richard

Even ignoring that, I know for a fact that the chip can't handle raw  
throughput numbers on 46 disks unless you've got some very severe  
raid overhead.  That chip is good for roughly 2GB/sec each  
direction.  46 7200RPM drives can fairly easily push 4x that amount  
in streaming IO loads.


Long story short, it appears you've got a 5lbs bag a 50lbs load...

--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 7:17 PM, Adam Cheal ach...@pnimedia.com wrote:

 LSI's sales literature on that card specs 128 devices which I take with a
 few hearty grains of salt. I agree that with all 46 drives pumping out
 streamed data, the controller would be overworked BUT the drives will only
 deliver data as fast as the OS tells them to. Just because the speedometer
 says 200 mph max doesn't mean we should (or even can!) go that fast.

 The IO intensive operations that trigger our timeout issues are a small
 percentage of the actual normal IO we do to the box. Most of the time the
 solution happily serves up archived data, but when it comes time to scrub or
 do mass operations on the entire dataset bad things happen. It seems a waste
 to architect a more expensive performance-oriented solution when you aren't
 going to use that performance the majority of the time. There is a balance
 between performance and functionality, but I still feel that we should be
 able to make this situation work.

 Ideally, the OS could dynamically adapt to slower storage and throttle its
 IO requests accordingly. At the least, it could allow the user to specify
 some IO thresholds so we can cage the beast if need be. We've tried some
 manual tuning via kernel parameters to restrict max queued operations per
 vdev and also a scrub related one (specifics escape me), but it still
 manages to overload itself.
 --


Where are you planning on queueing up those requests?  The scrub, I can
understand wanting throttling, but what about your user workload?  Unless
you're talking about EXTREMELY  short bursts of I/O, what do you suggest the
OS do?  If you're sending 3000 IOPS at the box from a workstation, where is
that workload going to sit if you're only dumping 500 IOPS to disk?  The
only thing that will change is that your client will timeout instead of your
disks.

I don't recall seeing what generates the I/O, but I do recall that it's
backup.  My assumption would be it's something coming in over the network,
in which case I'd say you're far, far better off throttling at the network
stack.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling richard.ell...@gmail.comwrote:


 Tim has a valid point. By default, ZFS will queue 35 commands per disk.
 For 46 disks that is 1,610 concurrent I/Os.  Historically, it has proven to
 be
 relatively easy to crater performance or cause problems with very, very,
 very expensive arrays that are easily overrun by Solaris. As a result, it
 is
 not uncommon to see references to setting throttles, especially in older
 docs.

 Fortunately, this is  simple to test by reducing the number of I/Os ZFS
 will queue.  See the Evil Tuning Guide

 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 The mpt source is not open, so the mpt driver's reaction to 1,610
 concurrent
 I/Os can only be guessed from afar -- public LSI docs mention a number of
 511
 concurrent I/Os for SAS1068, but it is not clear to me that is an explicit
 limit.  If
 you have success with zfs_vdev_max_pending set to 10, then the mystery
 might be solved. Use iostat to observe the wait and actv columns, which
 show the number of transactions in the queues.  JCMP?

 NB sometimes a driver will have the limit be configurable. For example, to
 get
 high performance out of a high-end array attached to a qlc card, I've set
 the execution-throttle in /kernel/drv/qlc.conf to be more than two orders
 of
 magnitude greater than its default of 32. /kernel/drv/mpt*.conf does not
 seem
 to have a similar throttle.
  -- richard



I believe there's a caveat here though.  That really only helps if the total
I/O load is actually enough for the controller to handle.  If the sustained
I/O workload is still 1600 concurrent I/O's, lowering the batch won't
actually cause any difference in the timeouts, will it?  It would obviously
eliminate burstiness (yes, I made that word up), but if the total sustained
I/O load is greater than the ASIC can handle, it's still going to fall over
and die with a queue of 10, correct?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
And therein lies the issue. The excessive load that causes the IO issues is 
almost always generated locally from a scrub or a local recursive ls used to 
warm up the SSD-based zpool cache with metadata. The regular network IO to the 
box is minimal and is very read-centric; once we load the box up with archived 
data (which generally happens in a short amount of time), we simply serve it 
out as needed.

As far as queueing goes, I would expect the system to queue bursts of IO in 
memory with appropriate timeouts, as required. These timeouts could either be 
manually or auto-magically adjusted to deal with the slower storage hardware. 
Obviously sustained intense IO requests would eventually blow up the queue so 
the goal here is to avoid creating those situations in the first place. We can 
throttle the network IO, if needed; I need the OS to know it's own local IO 
boundaries though and not attempt to overwork itself during scrubs etc.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 5:32 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling richard.ell...@gmail.com 
 wrote:


Tim has a valid point. By default, ZFS will queue 35 commands per  
disk.
For 46 disks that is 1,610 concurrent I/Os.  Historically, it has  
proven to be
relatively easy to crater performance or cause problems with very,  
very,
very expensive arrays that are easily overrun by Solaris. As a  
result, it is
not uncommon to see references to setting throttles, especially in  
older docs.


Fortunately, this is  simple to test by reducing the number of I/Os  
ZFS

will queue.  See the Evil Tuning Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

The mpt source is not open, so the mpt driver's reaction to 1,610  
concurrent
I/Os can only be guessed from afar -- public LSI docs mention a  
number of 511
concurrent I/Os for SAS1068, but it is not clear to me that is an  
explicit limit.  If

you have success with zfs_vdev_max_pending set to 10, then the mystery
might be solved. Use iostat to observe the wait and actv columns,  
which

show the number of transactions in the queues.  JCMP?

NB sometimes a driver will have the limit be configurable. For  
example, to get
high performance out of a high-end array attached to a qlc card,  
I've set
the execution-throttle in /kernel/drv/qlc.conf to be more than two  
orders of
magnitude greater than its default of 32. /kernel/drv/mpt*.conf does  
not seem

to have a similar throttle.
 -- richard



I believe there's a caveat here though.  That really only helps if  
the total I/O load is actually enough for the controller to handle.   
If the sustained I/O workload is still 1600 concurrent I/O's,  
lowering the batch won't actually cause any difference in the  
timeouts, will it?  It would obviously eliminate burstiness (yes, I  
made that word up), but if the total sustained I/O load is greater  
than the ASIC can handle, it's still going to fall over and die with  
a queue of 10, correct?


Yes, but since they are disks, and I'm assuming HDDs here, there is no
chance the disks will be faster than the host's ability to send I/Os ;-)
iostat will show what the queues look like.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
Here is example of the pool config we use:

# zpool status
  pool: pool002
 state: ONLINE
 scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 2009
config:

NAME STATE READ WRITE CKSUM
pool002  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t18d0  ONLINE   0 0 0
c9t17d0  ONLINE   0 0 0
c9t55d0  ONLINE   0 0 0
c9t13d0  ONLINE   0 0 0
c9t15d0  ONLINE   0 0 0
c9t16d0  ONLINE   0 0 0
c9t11d0  ONLINE   0 0 0
c9t12d0  ONLINE   0 0 0
c9t14d0  ONLINE   0 0 0
c9t9d0   ONLINE   0 0 0
c9t8d0   ONLINE   0 0 0
c9t10d0  ONLINE   0 0 0
c9t29d0  ONLINE   0 0 0
c9t28d0  ONLINE   0 0 0
c9t27d0  ONLINE   0 0 0
c9t23d0  ONLINE   0 0 0
c9t25d0  ONLINE   0 0 0
c9t26d0  ONLINE   0 0 0
c9t21d0  ONLINE   0 0 0
c9t22d0  ONLINE   0 0 0
c9t24d0  ONLINE   0 0 0
c9t19d0  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t30d0  ONLINE   0 0 0
c9t31d0  ONLINE   0 0 0
c9t32d0  ONLINE   0 0 0
c9t33d0  ONLINE   0 0 0
c9t34d0  ONLINE   0 0 0
c9t35d0  ONLINE   0 0 0
c9t36d0  ONLINE   0 0 0
c9t37d0  ONLINE   0 0 0
c9t38d0  ONLINE   0 0 0
c9t39d0  ONLINE   0 0 0
c9t40d0  ONLINE   0 0 0
c9t41d0  ONLINE   0 0 0
c9t42d0  ONLINE   0 0 0
c9t44d0  ONLINE   0 0 0
c9t45d0  ONLINE   0 0 0
c9t46d0  ONLINE   0 0 0
c9t47d0  ONLINE   0 0 0
c9t48d0  ONLINE   0 0 0
c9t49d0  ONLINE   0 0 0
c9t50d0  ONLINE   0 0 0
c9t51d0  ONLINE   0 0 0
c9t52d0  ONLINE   0 0 0
cache
  c8t2d0 ONLINE   0 0 0
  c8t3d0 ONLINE   0 0 0
spares
  c9t20d0AVAIL   
  c9t43d0AVAIL   

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s0  ONLINE   0 0 0
c8t1d0s0  ONLINE   0 0 0

errors: No known data errors

...and here is a snapshot of the system using iostat -indexC 5 during a scrub 
of pool002 (c8 is onboard AHCI controller, c9 is LSI SAS 3801E):

  extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t3d0
 8738.70.0 555346.10.0  0.1 345.00.0   39.5   0 3875   0   1   1   
2 c9
  194.80.0 11936.90.0  0.0  7.90.0   40.3   0  87   0   0   0   0 
c9t8d0
  194.60.0 12927.90.0  0.0  7.60.0   38.9   0  86   0   0   0   0 
c9t9d0
  194.60.0 12622.60.0  0.0  8.10.0   41.7   0  90   0   0   0   0 
c9t10d0
  201.60.0 13350.90.0  0.0  8.00.0   39.5   0  90   0   0   0   0 
c9t11d0
  194.40.0 12902.30.0  0.0  7.80.0   40.1   0  88   0   0   0   0 
c9t12d0
  194.60.0 12902.30.0  0.0  7.70.0   39.3   0  88   0   0   0   0 
c9t13d0
  195.40.0 12479.00.0  0.0  8.50.0   43.4   0  92   0   0   0   0 
c9t14d0
  197.60.0 13107.40.0  0.0  8.10.0   41.0   0  92   0   0   0   0 
c9t15d0
  198.80.0 12918.10.0  0.0  8.20.0   41.4   0  92   0   0   0   0 
c9t16d0
  201.00.0 13350.30.0  0.0  8.10.0   40.4   0  91   0   0   0   0 
c9t17d0
  201.20.0 13325.00.0  0.0  7.80.0   38.5   0  88   0   0   0   0 
c9t18d0
  200.60.0 13021.50.0  0.0  8.20.0   40.7   0  91   0   0   0   0 
c9t19d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c9t20d0
  196.60.0 12991.9

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

ok, see below...

On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:


Here is example of the pool config we use:

# zpool status
 pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52  
2009

config:

   NAME STATE READ WRITE CKSUM
   pool002  ONLINE   0 0 0
 raidz2 ONLINE   0 0 0
   c9t18d0  ONLINE   0 0 0
   c9t17d0  ONLINE   0 0 0
   c9t55d0  ONLINE   0 0 0
   c9t13d0  ONLINE   0 0 0
   c9t15d0  ONLINE   0 0 0
   c9t16d0  ONLINE   0 0 0
   c9t11d0  ONLINE   0 0 0
   c9t12d0  ONLINE   0 0 0
   c9t14d0  ONLINE   0 0 0
   c9t9d0   ONLINE   0 0 0
   c9t8d0   ONLINE   0 0 0
   c9t10d0  ONLINE   0 0 0
   c9t29d0  ONLINE   0 0 0
   c9t28d0  ONLINE   0 0 0
   c9t27d0  ONLINE   0 0 0
   c9t23d0  ONLINE   0 0 0
   c9t25d0  ONLINE   0 0 0
   c9t26d0  ONLINE   0 0 0
   c9t21d0  ONLINE   0 0 0
   c9t22d0  ONLINE   0 0 0
   c9t24d0  ONLINE   0 0 0
   c9t19d0  ONLINE   0 0 0
 raidz2 ONLINE   0 0 0
   c9t30d0  ONLINE   0 0 0
   c9t31d0  ONLINE   0 0 0
   c9t32d0  ONLINE   0 0 0
   c9t33d0  ONLINE   0 0 0
   c9t34d0  ONLINE   0 0 0
   c9t35d0  ONLINE   0 0 0
   c9t36d0  ONLINE   0 0 0
   c9t37d0  ONLINE   0 0 0
   c9t38d0  ONLINE   0 0 0
   c9t39d0  ONLINE   0 0 0
   c9t40d0  ONLINE   0 0 0
   c9t41d0  ONLINE   0 0 0
   c9t42d0  ONLINE   0 0 0
   c9t44d0  ONLINE   0 0 0
   c9t45d0  ONLINE   0 0 0
   c9t46d0  ONLINE   0 0 0
   c9t47d0  ONLINE   0 0 0
   c9t48d0  ONLINE   0 0 0
   c9t49d0  ONLINE   0 0 0
   c9t50d0  ONLINE   0 0 0
   c9t51d0  ONLINE   0 0 0
   c9t52d0  ONLINE   0 0 0
   cache
 c8t2d0 ONLINE   0 0 0
 c8t3d0 ONLINE   0 0 0
   spares
 c9t20d0AVAIL
 c9t43d0AVAIL

errors: No known data errors

 pool: rpool
state: ONLINE
scrub: none requested
config:

   NAME  STATE READ WRITE CKSUM
   rpool ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c8t0d0s0  ONLINE   0 0 0
   c8t1d0s0  ONLINE   0 0 0

errors: No known data errors

...and here is a snapshot of the system using iostat -indexC 5  
during a scrub of pool002 (c8 is onboard AHCI controller, c9 is  
LSI SAS 3801E):


 extended device statistics     
errors ---
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t0d0
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t1d0
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t2d0
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t3d0
8738.70.0 555346.10.0  0.1 345.00.0   39.5   0 3875
0   1   1   2 c9


You see 345 entries in the active queue. If the controller rolls over at
511 active entries, then it would explain why it would soon begin to
have difficulty.

Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite
respectable.

 194.80.0 11936.90.0  0.0  7.90.0   40.3   0  87   0
0   0   0 c9t8d0


These disks are doing almost 200 read IOPS, but are not 100% busy.
Average I/O size is 66 KB, which is not bad, lots of little I/Os could  
be

worse, but at only 11.9 MB/s, you are not near the media bandwidth.
Average service time is 40.3 milliseconds, which is not super, but may
be reflective of contention in the channel.
So there is more capacity to accept I/O commands, but...

 194.60.0 12927.90.0  0.0  7.60.0   38.9   0  86   0
0   0   0 c9t9d0
 194.60.0 12622.60.0  0.0  8.10.0   41.7   0  90   0
0   0   0 c9t10d0
 201.60.0 13350.90.0  0.0  8.00.0   39.5   0  90   0
0   0   0 c9t11d0
 194.40.0 12902.30.0  0.0  7.80.0   40.1   0  88   0
0   0   0 c9t12d0
 194.60.0 12902.30.0  0.0  7.70.0   39.3   0  88   0
0   0   0 c9t13d0

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Cindy Swearingen

Hi Bruno,

I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.

According to the 6694909 comments, this issue is documented in the
release notes.

As they are harmless, I wouldn't worry about them.

Maybe someone from the driver group can comment further.

Cindy


On 10/22/09 05:40, Bruno Sousa wrote:

Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started to 
see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the 
driver?,that now i have more information, where in the past such 
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this 
behaviour :



1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Down, No Errors

Adapter Phy 1:  Link Down, No Errors

Adapter Phy 2:  Link Down, No Errors

Adapter Phy 3:  Link Down, No Errors

Adapter Phy 4:  Link Up, No Errors

Adapter Phy 5:  Link Up, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up
 Invalid DWord Count  79,967,229
 Running Disparity Error Count63,036,893
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 1:  Link Up
 Invalid DWord Count  79,967,207
 Running Disparity Error Count78,339,626
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 2:  Link Up
 Invalid DWord Count  76,717,646
 Running Disparity Error Count73,334,563
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 3:  Link Up
 Invalid DWord Count  79,896,409
 Running Disparity Error Count76,199,329
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 4:  Link Up, No Errors

Expander (Handle 0009) Phy 5:  Link Up, No Errors

Expander (Handle 0009) Phy 6:  Link Up, No Errors

Expander (Handle 0009) Phy 7:  Link Up, No 

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Adam Cheal
Cindy: How can I view the bug report you referenced? Standard methods show my 
the bug number is valid (6694909) but no content or notes. We are having 
similar messages appear with snv_118 with a busy LSI controller, especially 
during scrubbing, and I'd be interested to see what they mentioned in that 
report. Also, the LSI firmware updates for the LSISAS3081E (the controller we 
use) don't usually come with release notes indicating what has changed in each 
firmware revision, so I'm not sure where they got that idea from.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread James C. McPherson

Adam Cheal wrote:

Cindy: How can I view the bug report you referenced? Standard methods
show my the bug number is valid (6694909) but no content or notes. We are
having similar messages appear with snv_118 with a busy LSI controller,
especially during scrubbing, and I'd be interested to see what they
mentioned in that report. Also, the LSI firmware updates for the
LSISAS3081E (the controller we use) don't usually come with release notes
indicating what has changed in each firmware revision, so I'm not sure
where they got that idea from.



Hi Adam,
unfortunately, you can't see that bug from outside.

The evaluation from LSI is very clear that this is a firmware issue
rather than a driver issue, and is claimed to be fixed in

LSI BIOS v6.26.00 FW 1.27.02
(aka Phase 15)


cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Adam Cheal
James: We are running Phase 16 on our LSISAS3801E's, and have also tried the 
recently released Phase 17 but it didn't help. All firmware NVRAM settings are 
default. Basically, when we put the disks behind this controller under load 
(e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of 
log entries that appear at random intervals:

scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@2d,0 (sd42):
   incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0: IOC Operational.

It seems to be timing out accessing a disk, retrying, giving up and then doing 
a bus reset?

This is happening with random disks behind the controller and on multiple 
systems with the same hardware config. We are running snv_118 right now and was 
hoping this was some magic mpt-related bug that was going to be fixed in 
snv_125 but it doesn't look like it. The LSI3801E is driving 2 x 23-disk JBOD's 
which, albeit a dense solution, it should be able to handle. We are also using 
wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower 
performance-wise, but the goal here is density not performance. I would have 
hoped that the system would just slow down if there was IO contention, but 
not experience things like bus resets.

Your thoughts?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread James C. McPherson

Adam Cheal wrote:

James: We are running Phase 16 on our LSISAS3801E's, and have also tried
the recently released Phase 17 but it didn't help. All firmware NVRAM
settings are default. Basically, when we put the disks behind this
controller under load (e.g. scrubbing, recursive ls on large ZFS
filesystem) we get this series of log entries that appear at random
intervals:

...

It seems to be timing out accessing a disk, retrying, giving up and then
doing a bus reset?

This is happening with random disks behind the controller and on multiple
systems with the same hardware config. We are running snv_118 right now
and was hoping this was some magic mpt-related bug that was going to be
fixed in snv_125 but it doesn't look like it. The LSI3801E is driving 2 x
23-disk JBOD's which, albeit a dense solution, it should be able to
handle. We are also using wide raidz2 vdevs (22 disks each, one per JBOD)
which agreeably is slower performance-wise, but the goal here is density
not performance. I would have hoped that the system would just slow
down if there was IO contention, but not experience things like bus
resets.

Your thoughts?


ugh. New bug time - bugs.opensolaris.org, please select
Solaris / kernel / driver-mpt. In addition to the error
messages and description of when you see it, please provide
output from

cfgadm -lav
prtconf -v

I'll see that it gets moved to the correct group asap.


Cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Adam Cheal
I've filed the bug, but was unable to include the prtconf -v output as the 
comments field only accepted 15000 chars total. Let me know if there is 
anything else I can provide/do to help figure this problem out as it is 
essentially preventing us from doing any kind of heavy IO to these pools, 
including scrubbing.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Carson Gaspar

On 10/22/09 4:07 PM, James C. McPherson wrote:

Adam Cheal wrote:

It seems to be timing out accessing a disk, retrying, giving up and then
doing a bus reset?

...

ugh. New bug time - bugs.opensolaris.org, please select
Solaris / kernel / driver-mpt. In addition to the error
messages and description of when you see it, please provide
output from

cfgadm -lav
prtconf -v

I'll see that it gets moved to the correct group asap.


FYI this is very similar to the behaviour I was seeing with my directly attached 
SATA disks on snv_118 (see the list archives for my original messages). I have 
not yet seen the error since I replaced my Hitachi 500 GB disks for Seagate 
1.5TB disks, so it could very well have been some unfortunate LSI firmware / 
Hitachi drive firmware interaction.


carson:gandalf 0 $ gzcat /var/adm/messages.2.gz  | ggrep -4 mpt | tail -9
Oct  8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] 
/p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0):

Oct  8 00:44:17 gandalf.taltos.org  Log info 0x3113 received for target 
1.
Oct  8 00:44:17 gandalf.taltos.org  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Oct  8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] 
/p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0):

Oct  8 00:44:17 gandalf.taltos.org  Log info 0x3113 received for target 
1.
Oct  8 00:44:17 gandalf.taltos.org  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Oct  8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] 
/p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0):

Oct  8 00:44:17 gandalf.taltos.org  Log info 0x3113 received for target 
1.
Oct  8 00:44:17 gandalf.taltos.org  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc


carson:gandalf 1 $ gzcat /var/adm/messages.2.gz  | sed -ne 's,^.*\(Log 
info\),\1,p' | sort -u

Log info 0x31110b00 received for target 7.
Log info 0x3113 received for target 0.
Log info 0x3113 received for target 1.
Log info 0x3113 received for target 2.
Log info 0x3113 received for target 3.
Log info 0x3113 received for target 4.
Log info 0x3113 received for target 6.
Log info 0x3113 received for target 7.
Log info 0x3114 received for target 0.
Log info 0x3114 received for target 1.
Log info 0x3114 received for target 2.
Log info 0x3114 received for target 3.
Log info 0x3114 received for target 4.
Log info 0x3114 received for target 6.
Log info 0x3114 received for target 7.

carson:gandalf 0 $ gzcat /var/adm/messages.2.gz  | sed -ne 
's,^.*\(scsi_status\),\1,p' | sort -u

scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss