Hello,
We have a backup strategy that involves mapping LUNs between a given pair of hosts, and copying data from one of the LUNs (src) and another LUN (dest). The src LUNs sit a SAN device, sometimes multiple devices (zpool mirror). The src LUN is running a MySQL database and typically will be running for weeks without issue.

When we start the backup sequence, we map a previously unmapped LUN to the DB host and issue the following commands:

root# cfgadm -al
(sleep 10)
root# luxadm probe
(sleep 10)
root# zpool import <pool_name>

After importing we'll perform some minor IO on the dest LUN, such as adding a symlink, removing some old configuration files. Then we'll start an ibbackup of that database from the src LUN to the dest LUN, and things go bad.

It's not completely consistent, but sometimes the DB host will crash, sometimes we'll get chksum/read/write errors on the src LUN. Looking at dmesg (when the host doesn't crash), we see the LUNs paths all disappear and then reappear usually around 20 seconds later. Example output below. Each LUN has 2 paths out of the DB host and 4 paths on each storage device, across two separate SANs.

Usually the host will crash when not running with a zpool mirror, which apparently in Sol10u4, it's expected behavior.

These hosts are x86_64 servers, running Sol10u4, unpatched. They use qlogic qla2342 HBAs, and the stock qlc driver. They are using MPXIO, from what I can tell.

If anyone has any tips on troubleshooting, or knows of things we are doing wrong, help would be appreciated.

Thanks,
Ethan

=======================================================
Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd3): Jun 3 15:00:33 dbhost Error for Command: write(10) Error Level: Retryable Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Requested Block: 186020890 Error Block: 186020890 Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Vendor: Pillar Serial Number: Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0 Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1):
Jun  3 15:00:33 dbhost   Lun=2 for target=21f00 reappeared
Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info] Target 0x21f00: Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1 Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd3): Jun 3 15:00:33 dbhost Error for Command: write(10) Error Level: Retryable Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Requested Block: 186020890 Error Block: 186020890 Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Vendor: Pillar Serial Number: Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Jun 3 15:00:33 dbhost scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0 Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0):
Jun  3 15:00:33 dbhost   Lun=2 for target=11f00 reappeared
Jun 3 15:00:33 dbhost scsi: [ID 243001 kern.info] Target 0x11f00: Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1 Jun 3 15:00:33 dbhost scsi: [ID 799468 kern.info] sd6 at scsi_vhci0: name g000b080084001453, bus address g000b080084001453 Jun 3 15:00:33 dbhost genunix: [ID 936769 kern.info] sd6 is /scsi_vhci/[EMAIL PROTECTED] Jun 3 15:00:33 dbhost genunix: [ID 408114 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd6) online Jun 3 15:00:33 dbhost genunix: [ID 834635 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: degraded, path /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fp1) to target add
ress: w2300000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:34 dbhost genunix: [ID 834635 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: optimal, path /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp0) to target addres
s: w2100000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:37 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0):
Jun  3 15:00:37 dbhost   Lun=2 for target=11e00 reappeared
Jun 3 15:00:37 dbhost scsi: [ID 243001 kern.info] Target 0x11e00: Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1 Jun 3 15:00:37 dbhost genunix: [ID 834635 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: optimal, path /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp0) to target addres
s: w2200000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:42 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0):
Jun  3 15:00:42 dbhost   Lun=3 for target=10e00 disappeared
Jun 3 15:00:42 dbhost scsi: [ID 243001 kern.info] Target 0x10e00: Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1 Jun 3 15:00:42 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fcp0): Jun 3 15:00:42 dbhost offlining lun=3 (trace=0), target=10e00 (trace=b10101) Jun 3 15:00:42 dbhost genunix: [ID 834635 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd5) multipath status: degraded, path /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp0) to target addre
ss: w2200000b08040e20,3 is offline Load balancing: round-robin
Jun 3 15:00:47 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1):
Jun  3 15:00:47 dbhost   Lun=2 for target=21e00 reappeared
Jun 3 15:00:47 dbhost scsi: [ID 243001 kern.info] Target 0x21e00: Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1 Jun 3 15:00:47 dbhost genunix: [ID 834635 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd6) multipath status: optimal, path /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fp1) to target addr
ess: w2400000b08040e40,2 is online Load balancing: round-robin
Jun 3 15:00:52 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1):
Jun  3 15:00:52 dbhost   Lun=3 for target=20e00 disappeared
Jun 3 15:00:52 dbhost scsi: [ID 243001 kern.info] Target 0x20e00: Nonzero peripheral qualifier: Device type=0x0 Peripheral qual=0x1 Jun 3 15:00:52 dbhost scsi: [ID 243001 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fcp1): Jun 3 15:00:52 dbhost offlining lun=3 (trace=0), target=20e00 (trace=b10101) Jun 3 15:00:52 dbhost genunix: [ID 408114 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd5) offline Jun 3 15:00:52 dbhost genunix: [ID 834635 kern.info] /scsi_vhci/[EMAIL PROTECTED] (sd5) multipath status: failed, path /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1077,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0 (fp1) to target addre
ss: w2400000b08040e20,3 is offline Load balancing: round-robin

_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to