Hello,
We have a backup strategy that involves mapping LUNs between a given
pair of hosts, and copying data from one of the LUNs (src) and another
LUN (dest). The src LUNs sit a SAN device, sometimes multiple devices
(zpool mirror). The src LUN is running a MySQL database and typically
will be running for weeks without issue.
When we start the backup sequence, we map a previously unmapped LUN to
the DB host and issue the following commands:
root# cfgadm -al
(sleep 10)
root# luxadm probe
(sleep 10)
root# zpool import <pool_name>
After importing we'll perform some minor IO on the dest LUN, such as
adding a symlink, removing some old configuration files. Then we'll
start an ibbackup of that database from the src LUN to the dest LUN, and
things go bad.
It's not completely consistent, but sometimes the DB host will crash,
sometimes we'll get chksum/read/write errors on the src LUN. Looking at
dmesg (when the host doesn't crash), we see the LUNs paths all disappear
and then reappear usually around 20 seconds later. Example output
below. Each LUN has 2 paths out of the DB host and 4 paths on each
storage device, across two separate SANs.
Usually the host will crash when not running with a zpool mirror, which
apparently in Sol10u4, it's expected behavior.
These hosts are x86_64 servers, running Sol10u4, unpatched. They use
qlogic qla2342 HBAs, and the stock qlc driver. They are using MPXIO,
from what I can tell.
If anyone has any tips on troubleshooting, or knows of things we are
doing wrong, help would be appreciated.
Thanks,
Ethan
=======================================================
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.warning] WARNING:
/scsi_vhci/[EMAIL PROTECTED] (sd3):
Jun 3 15:00:33 dbhost Error for Command: write(10)
Error Level: Retryable
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Requested
Block: 186020890 Error Block: 186020890
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Vendor:
Pillar Serial Number:
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Sense Key: Unit
Attention
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] ASC: 0x3f
(reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0
Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1):
Jun 3 15:00:33 dbhost Lun=2 for target=21f00 reappeared
Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info] Target 0x21f00:
Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.warning] WARNING:
/scsi_vhci/[EMAIL PROTECTED] (sd3):
Jun 3 15:00:33 dbhost Error for Command: write(10)
Error Level: Retryable
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Requested
Block: 186020890 Error Block: 186020890
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Vendor:
Pillar Serial Number:
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Sense Key: Unit
Attention
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] ASC: 0x3f
(reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0
Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0):
Jun 3 15:00:33 dbhost Lun=2 for target=11f00 reappeared
Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info] Target 0x11f00:
Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1
Jun 3 15:00:33 dbhost scsi: [ID 799468 kern.info] sd6 at scsi_vhci0:
name g000b080084001453, bus address g000b080084001453
Jun 3 15:00:33 dbhost genunix: [ID 936769 kern.info] sd6 is
/scsi_vhci/[EMAIL PROTECTED]
Jun 3 15:00:33 dbhost genunix: [ID 408114 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd6) online
Jun 3 15:00:33 dbhost genunix: [ID 834635 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: degraded, path
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fp1) to target add
ress: w2300000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:34 dbhost genunix: [ID 834635 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: optimal, path
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp0) to target addres
s: w2100000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:37 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0):
Jun 3 15:00:37 dbhost Lun=2 for target=11e00 reappeared
Jun 3 15:00:37 dbhost scsi: [ID 243001 kern.info] Target 0x11e00:
Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1
Jun 3 15:00:37 dbhost genunix: [ID 834635 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: optimal, path
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp0) to target addres
s: w2200000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:42 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0):
Jun 3 15:00:42 dbhost Lun=3 for target=10e00 disappeared
Jun 3 15:00:42 dbhost scsi: [ID 243001 kern.info] Target 0x10e00:
Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1
Jun 3 15:00:42 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0):
Jun 3 15:00:42 dbhost offlining lun=3 (trace=0), target=10e00
(trace=b10101)
Jun 3 15:00:42 dbhost genunix: [ID 834635 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd5) multipath status: degraded, path
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp0) to target addre
ss: w2200000b08040e20,3 is offline Load balancing: round-robin
Jun 3 15:00:47 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1):
Jun 3 15:00:47 dbhost Lun=2 for target=21e00 reappeared
Jun 3 15:00:47 dbhost scsi: [ID 243001 kern.info] Target 0x21e00:
Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1
Jun 3 15:00:47 dbhost genunix: [ID 834635 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: optimal, path
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fp1) to target addr
ess: w2400000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:52 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1):
Jun 3 15:00:52 dbhost Lun=3 for target=20e00 disappeared
Jun 3 15:00:52 dbhost scsi: [ID 243001 kern.info] Target 0x20e00:
Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1
Jun 3 15:00:52 dbhost scsi: [ID 243001 kern.info]
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1):
Jun 3 15:00:52 dbhost offlining lun=3 (trace=0), target=20e00
(trace=b10101)
Jun 3 15:00:52 dbhost genunix: [ID 408114 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd5) offline
Jun 3 15:00:52 dbhost genunix: [ID 834635 kern.info]
/scsi_vhci/[EMAIL PROTECTED] (sd5) multipath status: failed, path
/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fp1) to target addre
ss: w2400000b08040e20,3 is offline Load balancing: round-robin
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss