[Bug 1020436] Re: Cannot read superblock after FC multipath failover
The root cause was in fact the lvm.conf filter, but explicitly not for the reason you'd think. The issue is that if I added a|.*| into regex array, it was ignoring my 'sd[b-z]', loop and ram exclusions, both singly and in combination. It seems to be an obscure issue with the use of square brackets inside the regex in certain ways. Eg if I use the line: filter = [ a|.*|, r|.*| ] or filter = [ a|.*|, r|loop| ] Then this filters _everything_ in the first example, or just loop devices respectively... just as you would expect. However as soon as I use something like: filter = [ a|.*|, r|loop[0-9]| ] Then I don't get any filtering at all... except that defining _only_ the removal filters, per: filter = [ r|loop[0-9]+| ] .. and other filters of that ilk DO work as expected -- so long as I remove the accept-all regex. I'm not sure if this parsing behaviour is intended -- it could be to do with the way the array is encapsulated. Even this line works as expected: filter = [ a|.*|, r|sd.[b-z]| ] -- Possibly since placing the the match-any-single-character (.) in there changes the behaviour. The filter currently in use works as expected and looks like: filter = [ r|sd[b-z]|, r|ram[0-9]+|, r|loop[0-9]+|, ] After update-initramfs and reboot, I can successfully disable paths, and the failover works perfectly. Your insight is much appreciated, I would never have expected /dev/mapper/mpath0 to become aliased to a single disk target. I suspect that what has actually happened is that things were working perfectly for a time, but then broke upon a reboot of the server. Ie, due to the lack of filtering, LVM jumbled the detected volumes at boot time (due to the way /dev is traversed) such that mpath0 was actually pointing at a fixed location eg /dev/sdb rather than being multipathed as expected. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
** Changed in: multipath-tools (Ubuntu) Status: Incomplete = Invalid -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
Output from before and after commands is attached. I'm pretty sure you're right about the LVM device filter; I figured that setting the scsi-timeout to 90 seconds (being way longer than the TUR path checker, which is scheduled every 20 seconds) should be enough to handle the device failover. It's as if dm-4 is using sdk (for example), but when this changes, either the journal is somehow stuck trying to update via the failed path, or dm-4 gets removed from the virtual device list because sdk is no longer there, instead of being reinstated with a new path. -- (Yes, dm-4 is where the xfs filesystem is (mpath0). The multipathed dm- entries are currently dm-5, dm-6 and dm-7.) root@hostname:/etc# multipath -ll | grep dm- mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V mpath1 (360060e80058a07008a07000e) dm-5 HITACHI ,OPEN-V mpath0 (360060e80058a07008a072417) dm-6 HITACHI ,OPEN-V root@hostname:/etc# pvscan PV /dev/mapper/mpath2 VG test2lvm2 [4.00 GiB / 0free] PV /dev/mapper/mpath0 VG mvsanlvm2 [500.00 GiB / 0free] PV /dev/mapper/mpath1 VG test lvm2 [4.00 GiB / 0free] PV /dev/sda1VG rgrprod-proc03 lvm2 [135.97 GiB / 108.03 GiB free] Total: 4 [643.96 GiB] / in use: 4 [643.96 GiB] / in no VG: 0 [0 ] /dev/mapper/test2-lvol0 on /tmp/test2 type xfs (rw) /dev/mapper/test-lvol0 on /tmp/test type ext4 (rw) /dev/mapper/mvsan-san on /srv/mysql type xfs (rw) --- Interestingly, this is before failure: root@hostname:/etc# find /sys/devices -name *dm-* -print | grep -v virtual | sort -t '/' --key=13 /sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-0 /sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-1 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:2/block/sdm/holders/dm-2 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-3 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-4 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:1/block/sdc/holders/dm-5 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:1/block/sdf/holders/dm-5 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:1/block/sdi/holders/dm-5 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-5 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:0/block/sdb/holders/dm-6 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:0/block/sde/holders/dm-6 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:0/block/sdh/holders/dm-6 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-6 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdd/holders/dm-7 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:2/block/sdg/holders/dm-7 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:2/block/sdj/holders/dm-7 /sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:2/block/sdm/holders/dm-7 After failure, that same listing looks like: root@hostname:/etc# find /sys/devices -name *dm-* -print | grep -v virtual | sort -t '/' --key=13 /sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-0 /sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-1 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:1/block/sdc/holders/dm-5 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:1/block/sdf/holders/dm-5 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:0/block/sdb/holders/dm-6 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:0/block/sde/holders/dm-6 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdd/holders/dm-7 /sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:2/block/sdg/holders/dm-7 Which I suppose is to be expected - except I'm guessing that the dm-(2,3,4) devices shouldn't disappear - they should map to the new sdX paths via the physical devices /dev/mapper/mpath(0,1,2) ... or however that pathing failover is meant to work in LVM. ** Attachment added: before and after tests https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3233696/+files/240712_1445.tgz -- You received this bug
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
It doesn't look like your lvm.conf filter is working. (pvdisplay) /dev/sdb: block size is 4096 bytes /dev/sdb: lvm2 label detected Ignoring duplicate PV RLOhdDURbD7uK2La3MDK2olkP0BF2Tu7 on /dev/sdb - using dm /dev/mapper/mpath0 Closed /dev/sdb Opened /dev/sdc RO O_DIRECT the sd names shouldn't even be seen. from the output of vgscan -vvv you should see something like this: /dev/ram15: No label detected Closed /dev/ram15 /dev/sdc: Skipping (regex) /dev/sdc1: Skipping (regex) Change your filter to what's documented in the multipath guide. https://help.ubuntu.com/12.04/serverguide/multipath-devices.html #multipath-devices-in-logical-volumes filter = [ r/block/, r/disk/, r/sd.*/, a/.*/ ] Run vgscan -vvv after to verify, those pvdisplay errors should also go away. If you want it to be extra quiet, you can filter out ram devices too, here's mine on an hp server with the following filter. filter = [ r/block/, r/ram.*/, r/disk/, r/cciss/, r/sd.*/, a/.*/ ] The thing to know about sd names is that they are not deterministic, ever, so to keep sda as part of your vg set you'll need to determine it's unique udev name and filter that in. Looking back at your original filter, I can see that it's wrong now. # By default we accept every block device: filter = [ a/.*/, r|/dev/sd[b-z]| ] It should be # By default we accept every block device: filter = [ a/.*/, r/sd.[b-z]/ ] Again, if you hotplug sda, or reverse scan your busses, this ceases filter accurately, use the unique udev names. Beyond that, everything looks good, all block devices are responding, that are left, after failover. I tested failover using lvm + xfs using multipath 0.4.8 with both multibus and priority grouping, I could not reproduce the issue. Load test was iozone -a in a loop. I just realized, you didn't describe what load these were under during the failure, or how many failovers are required to reach the fault. You're dmtables look sane. -0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:64 1000 8:16 1000 8:96 1000 8:160 1000 +0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 2 1 8:64 1000 8:16 1000 You can tell that it's all in one group via the single round-robin directive, which marks the beginning of all paths in that group, clearly you went from 4 - 2. The fact that the map is updated at all means multipathd did it's job. multipathd's main job is the patch checking, and managing the policy we define for those paths by updating the dm tables as needed. DM then simply operates on what it has for a table definition and keeps chugging along until that table changes. back to the original multipathd output you provide DM_STATE=active and DM_SUSPENDED=0, that map was never deactivated. I tried pushing my queue up to 1000 from 128 for rr_min_io, no change. If it's not the lvm filter then it's something else very subtle. Unless you're comfortable with upgrading to multipath 0.4.9 via apt pinning (done this before with lucid, very stable, and better) as I described in comment #16, the only way we'll get to the bottom of this is if we're hands on; which necessitates an Ubuntu Advantage agreement. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
It looks like all the /dev/mapper/mpathX targets, and the remaining 'sd' unique paths (with one fabric disabled) were all readable after the path shutdown, but the underlying LVs were somehow still deactivated when the paths disappeared. root@hostname-03:/srv/mysql# dmesg snip [1039812.298161] I/O error in filesystem (dm-4) meta-data dev dm-4 block 0x1f414af0 (xlog_iodone) error 5 buf count 3072 [1039812.311419] xfs_force_shutdown(dm-4,0x2) called from line 1043 of file /build/buildd/linux-2.6.32/fs/xfs/xfs_log.c. Return address = 0xa0162516 [1039812.311433] Filesystem dm-4: Log I/O Error Detected. Shutting down filesystem: dm-4 [1039812.326038] Please umount the filesystem, and rectify the problem(s) root@hostname:/srv/mysql# ls -l ls: cannot open directory .: Input/output error root@hostname:/srv/mysql# lvs -o lv_attr /dev/test2/lvol0: read failed after 0 of 2048 at 0: Input/output error /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error /dev/mvsan/san: read failed after 0 of 2048 at 0: Input/output error Attr -wi-ao -wi-ao -wi-ao -wi-ao -wi-ao root@hostname:/srv/mysql# dd if=/dev/mapper/mpath mpath0 mpath1 mpath2 root@hostname:/srv/mysql# dd if=/dev/mapper/mpath0 of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 0.0191405 s, 267 kB/s root@hostname:/srv/mysql# dd if=/dev/mapper/mpath1 of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 0.00693051 s, 739 kB/s root@hostname:/srv/mysql# dd if=/dev/mapper/mpath2 of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 0.00559024 s, 916 kB/s - root@hostname:/srv/mysql# ls -l /dev/sd* brw-rw 1 root disk 8, 0 2012-07-11 14:24 /dev/sda brw-rw 1 root disk 8, 1 2012-07-11 14:24 /dev/sda1 brw-rw 1 root disk 8, 16 2012-07-11 14:24 /dev/sdb brw-rw 1 root disk 8, 32 2012-07-11 14:24 /dev/sdc brw-rw 1 root disk 8, 48 2012-07-11 14:24 /dev/sdd brw-rw 1 root disk 8, 64 2012-07-11 14:24 /dev/sde brw-rw 1 root disk 8, 80 2012-07-11 14:24 /dev/sdf brw-rw 1 root disk 8, 96 2012-07-11 14:24 /dev/sdg root@hostname:/srv/mysql# dd if=/dev/sdb of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 5.0139e-05 s, 102 MB/s root@hostname:/srv/mysql# dd if=/dev/sdc of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 4.0726e-05 s, 126 MB/s root@hostname:/srv/mysql# dd if=/dev/sdd of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 4.0935e-05 s, 125 MB/s root@hostname:/srv/mysql# dd if=/dev/sde of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 4.2647e-05 s, 120 MB/s root@hostname:/srv/mysql# dd if=/dev/sdf of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 4.1718e-05 s, 123 MB/s root@hostname:/srv/mysql# dd if=/dev/sdg of=/dev/null count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 4.3425e-05 s, 118 MB/s -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
Hmm, we've got some counter indicators here. lvs claims that the volumes are active. But the probe itself is showing problems reading the volumes. XFS is telling us that it cannot write it's journal to disk. [1039812.311433] Filesystem dm-4: Log I/O Error Detected. Shutting down filesystem: dm-4 fs/xfs/xfs_rw.c 90 /* 91 * Force a shutdown of the filesystem instantly while keeping 92 * the filesystem consistent. We don't do an unmount here; just shutdown 93 * the shop, make sure that absolutely nothing persistent happens to 94 * this filesystem after this point. 95 */ 96 void 97 xfs_do_force_shutdown( ... 127 if (flags SHUTDOWN_CORRUPT_INCORE) { 128 xfs_cmn_err(XFS_PTAG_SHUTDOWN_CORRUPT, CE_ALERT, mp, 129 Corruption of in-memory data detected. Shutting down filesystem: %s, 130 mp-m_fsname); 131 if (XFS_ERRLEVEL_HIGH = xfs_error_level) { 132 xfs_stack_trace(); 133 } 134 } else if (!(flags SHUTDOWN_FORCE_UMOUNT)) { 135 if (logerror) { 136 xfs_cmn_err(XFS_PTAG_SHUTDOWN_LOGERROR, CE_ALERT, mp, 137 Log I/O Error Detected. Shutting down filesystem: %s, 138 mp-m_fsname); The logs conveniently tell us where it was called from too. 1009 void 1010 xlog_iodone(xfs_buf_t *bp) 1011 { ... 1036 /* 1037 * Race to shutdown the filesystem if we see an error. 1038 */ 1039 if (XFS_TEST_ERROR((XFS_BUF_GETERROR(bp)), l-l_mp, 1040 XFS_ERRTAG_IODONE_IOERR, XFS_RANDOM_IODONE_IOERR)) { 1041 xfs_ioerror_alert(xlog_iodone, l-l_mp, bp, XFS_BUF_ADDR(bp)); 1042 XFS_BUF_STALE(bp); 1043 xfs_force_shutdown(l-l_mp, SHUTDOWN_LOG_IO_ERROR); 1044 /* 1045 * This flag will be propagated to the trans-committed 1046 * callback routines to let them know that the log-commit 1047 * didn't succeed. 1048 */ 1049 aborted = XFS_LI_ABORTED; I assume dm-4 is the LV that XFS is mounted on, did you run the dd test on that? I'm starting to wonder if the LVM device filter is lying to us, after failover, something changes which misrepresents the LV and then XFS bails out. If you can perform that DD for every PV that backs dm-4 successfully then there's something wrong with the DM map for those LVs after failover occurs. OK, what I need from you now is a before and after (same fault injection method) of: 0) ls -lR /dev/ dev_major_minor.log 1) lvs -o lv_attr 2) pvdisplay -vvv 3) lvdisplay -vvv 4) dmsetup table -v 5) dd test on all block devices: lv, mp, sd 6) dmesg output Please attach this as a single tarball, that has a timestamp in the filename and has a directory structure of: foo.tgz before/ after/ If this all checks out, then what's probably happening is the when multipath begins the failover process, there's enough of a delay that XFS simply bails out early before IO is ready to be sent down the remaining paths. group by priority may perform better here and is something you can test. I looked at the XFS mount arguments and didn't find anything that would make it more lenient in these situations. If you can manage it, a LUN formatted with ext3 under these circumstances would help in ruling out whether the filesystem is part of the problem. Thanks. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
scratch the ext3 test, I just re-read the description. I've replicated this behaviour using both xfs and ext4 filesystems, on multiple different luns presented to the server. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
Alrighty. [OK] lvm.conf device blacklist [OK] multipath.conf, sans fix I already pointed out You can ignore that FW blurb, it's just a script that shouldn't be so noisy. failover logs *look good* bug-1020436$ grep fail_path multipathd_stdout.log | sort -u Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:112 Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:160 Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:128 Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:176 Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:144 Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:192 # 3 mins go by... oh and several devices are failed twice # Jul 10 15:10:19 devices start to return Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:16 Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0 NF fail_path 8:64 Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:32 Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1 NF fail_path 8:80 Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:48 Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2 NF fail_path 8:96 and according to udev, note that I re-sequenced the grep and grouped by mpathX. bug-1020436$ grep -A4 PATH_FAILED multipathd_stdout.log -- Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=1 Jul 10 15:08:25 | DM_PATH=8:112 Jul 10 15:08:25 | DM_NR_VALID_PATHS=3 Jul 10 15:08:25 | DM_NAME=mpath0 -- Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=2 Jul 10 15:08:25 | DM_PATH=8:160 Jul 10 15:08:25 | DM_NR_VALID_PATHS=2 Jul 10 15:08:25 | DM_NAME=mpath0 -- Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=3 Jul 10 15:08:25 | DM_PATH=8:160 Jul 10 15:08:25 | DM_NR_VALID_PATHS=2 Jul 10 15:08:25 | DM_NAME=mpath0 -- Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=1 Jul 10 15:08:25 | DM_PATH=8:128 Jul 10 15:08:25 | DM_NR_VALID_PATHS=3 Jul 10 15:08:25 | DM_NAME=mpath1 -- Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=2 Jul 10 15:08:25 | DM_PATH=8:176 Jul 10 15:08:25 | DM_NR_VALID_PATHS=2 Jul 10 15:08:25 | DM_NAME=mpath1 -- Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=3 Jul 10 15:08:25 | DM_PATH=8:176 Jul 10 15:08:25 | DM_NR_VALID_PATHS=2 Jul 10 15:08:25 | DM_NAME=mpath1 Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=1 Jul 10 15:08:25 | DM_PATH=8:144 Jul 10 15:08:25 | DM_NR_VALID_PATHS=3 Jul 10 15:08:25 | DM_NAME=mpath2 -- Jul 10 15:08:25 | DM_ACTION=PATH_FAILED Jul 10 15:08:25 | DM_SEQNUM=2 Jul 10 15:08:25 | DM_PATH=8:192 Jul 10 15:08:25 | DM_NR_VALID_PATHS=2 Jul 10 15:08:25 | DM_NAME=mpath2 This all looks OK, you can see the paths count down and stop at the right count. The way multibus works is a lot like network bonding link aggregation, io is simply steered down the remaining paths. What might have happened is the LVs backed by this mpath device somehow got deactivated and that's why the filesystem can't write back. Try this for me please. Shutdown those paths and keep them down, then after about a minute, run. # lvs -o lv_attr We should see w and a for every volume, if not, that's the problem. Also, while in the failed state, see if you can do a dd read from one of the mpaths? like dd if=/dev/mapper/mpath0 of=/dev/null count=10. If that works, try the same thing with a random sd device. I don't see anything amiss in the strace. The SG ioctls look OK and the ioctls made to DM make sense. We want to determine at which level that sending block ios is hung up. ** Changed in: multipath-tools (Ubuntu) Status: Confirmed = Incomplete -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
To date, the issue hasn't been observed on the two physical hosts we have running 10.04.1 LTS with the same multipath-tools version, which certainly raises a flag. They are being used as mission critical / production database servers so I'm scheduling a window in which we will be able to confirm whether or not it is manifesting on those hosts too. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
** Changed in: multipath-tools (Ubuntu) Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
Thanks for all the feedback, there's quite a bit of information here to sort through. In the meanwhile, it would be an interesting data point to see how well this array functions under multipath 0.4.9, which is what Precise, the new LTS uses. This can be accomplished by using apt pinning. http://askubuntu.com/questions/103320/install-packages-from-newer- release-without-building-apt-pinning/103338#103338 In your case: $ sudo tee /etc/apt/sources.list.d/precise.list EOF deb http://us.archive.ubuntu.com/ubuntu/ precise main restricted deb-src http://us.archive.ubuntu.com/ubuntu/ precise main restricted deb http://us.archive.ubuntu.com/ubuntu/ precise-updates main restricted deb-src http://us.archive.ubuntu.com/ubuntu/ precise-updates main restricted EOF $ sudo tee /etc/apt/preferences EOF Package: multipath-tools* Pin: release n=precise Pin-Priority: 990 Package: * Pin: release n=lucid Pin-Priority: 900 Package: * Pin: release o=Ubuntu Pin-Priority: -10 EOF $ sudo apt-get update $ sudo apt-get upgrade Should update multipath-tools and boot to 0.4.9, if not then $ sudo apt-get install -t precise multipath-tools multipath-tools-boot This should unjam you while we triage the data and will provide a valuable data point to aid in backporting if necessary. Note that the config syntax has changed slightly. https://help.ubuntu.com/12.04/serverguide/device-mapper- multipathing.html#multipath-new-and-changed-features It's ok to leave both directives in the file multpath simply globs keywords and uses the ones that fit. Remember to update your initrd after changing the config file. I assume you have multiple SAN clients, please keep one at 0.4.8 so we can continue to debug this. Thanks. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
** Attachment removed: strace_mpathd.log https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218020/+files/strace_mpathd.log ** Attachment added: strace_mpathd.log https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218069/+files/strace_mpathd.log -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
-- lvm.conf blacklist should be fine, searches /dev but filters all /dev/sd.* I'll attach those files momentarily. I didn't see any kpartx processes running at all; there wasn't anything in an uninterruptable sleep state ('D') either. Listing all kernel and some additional processes here (removed stuff like apache / php processes and certain publicly identifiable stuff that's not relevant) USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 1 0.6 0.0 23828 2064 ?Ss 14:59 0:04 /sbin/init root 2 0.0 0.0 0 0 ?S14:59 0:00 [kthreadd] root 3 0.0 0.0 0 0 ?S14:59 0:00 [migration/0] root 4 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/0] root 5 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/0] root 6 0.0 0.0 0 0 ?S14:59 0:00 [migration/1] root 7 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/1] root 8 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/1] root 9 0.0 0.0 0 0 ?S14:59 0:00 [migration/2] root10 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/2] root11 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/2] root12 0.0 0.0 0 0 ?S14:59 0:00 [migration/3] root13 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/3] root14 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/3] root15 0.0 0.0 0 0 ?S14:59 0:00 [migration/4] root16 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/4] root17 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/4] root18 0.0 0.0 0 0 ?S14:59 0:00 [migration/5] root19 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/5] root20 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/5] root21 0.0 0.0 0 0 ?S14:59 0:00 [migration/6] root22 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/6] root23 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/6] root24 0.0 0.0 0 0 ?S14:59 0:00 [migration/7] root25 0.1 0.0 0 0 ?S14:59 0:01 [ksoftirqd/7] root26 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/7] root27 0.0 0.0 0 0 ?S14:59 0:00 [migration/8] root28 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/8] root29 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/8] root30 0.0 0.0 0 0 ?S14:59 0:00 [migration/9] root31 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/9] root32 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/9] root33 0.0 0.0 0 0 ?S14:59 0:00 [migration/10] root34 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/10] root35 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/10] root36 0.0 0.0 0 0 ?S14:59 0:00 [migration/11] root37 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/11] root38 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/11] root39 0.0 0.0 0 0 ?S14:59 0:00 [migration/12] root40 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/12] root41 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/12] root42 0.0 0.0 0 0 ?S14:59 0:00 [migration/13] root43 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/13] root44 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/13] root45 0.0 0.0 0 0 ?S14:59 0:00 [migration/14] root46 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/14] root47 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/14] root48 0.0 0.0 0 0 ?S14:59 0:00 [migration/15] root49 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/15] root50 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/15] root51 0.0 0.0 0 0 ?S14:59 0:00 [migration/16] root52 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/16] root53 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/16] root54 0.0 0.0 0 0 ?S14:59 0:00 [migration/17] root55 0.0 0.0 0 0 ?S14:59 0:00 [ksoftirqd/17] root56 0.0 0.0 0 0 ?S14:59 0:00 [watchdog/17] root57 0.0 0.0 0 0 ?S14:59 0:00 [migration/18] root58 0.0 0.0
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
Your blacklist definition is wrong. You provided. blacklist { wwid 3600605b0039afe20ff54052e7d38 vendor SMART product SMART } defaults { user_friendly_names yes } wwid cites an entire device, vendor product cite a different device and must be included in device { } block. If you look at the show config output, you'll note that your blacklisted device by vendor/product isn't there. Change to: blacklist { wwid 3600605b0039afe20ff54052e7d38 device { vendor SMART product SMART } } defaults { user_friendly_names yes } update your initrd (multipath.conf is copied there) and reboot. Also, it would be nice to see a list of all your storage devices. I prefer lsscsi, e.g. root@nashira:~# lsscsi [0:0:0:0]cd/dvd hp DVD A DS8A5LH 1HE3 /dev/sr0 [2:0:0:0]diskNETAPP LUN 8020 /dev/sdb [2:0:1:0]diskNETAPP LUN 8020 /dev/sdd [3:0:0:0]storage HP P410i3.66 - [3:0:0:1]diskHP LOGICAL VOLUME 3.66 /dev/sda [5:0:0:0]diskNETAPP LUN 8020 /dev/sdc [5:0:1:0]diskNETAPP LUN 8020 /dev/sde Finally, since this is FC, how is it zoned? Detailed port level topology please. Thanks. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
Storage devices: $ sudo lsscsi [0:0:0:0]cd/dvd TSSTcorp CDDVDW TS-L633F IT03 /dev/sr0 [4:2:0:0]diskIBM ServeRAID M5015 2.12 /dev/sda [5:0:0:0]diskHITACHI OPEN-V 6008 /dev/sdb [5:0:0:1]diskHITACHI OPEN-V 6008 /dev/sdc [5:0:0:2]diskHITACHI OPEN-V 6008 /dev/sdd [5:0:1:0]diskHITACHI OPEN-V 6008 /dev/sde [5:0:1:1]diskHITACHI OPEN-V 6008 /dev/sdf [5:0:1:2]diskHITACHI OPEN-V 6008 /dev/sdg [6:0:0:0]diskHITACHI OPEN-V 6008 /dev/sdh [6:0:0:1]diskHITACHI OPEN-V 6008 /dev/sdi [6:0:0:2]diskHITACHI OPEN-V 6008 /dev/sdj [6:0:1:0]diskHITACHI OPEN-V 6008 /dev/sdk [6:0:1:1]diskHITACHI OPEN-V 6008 /dev/sdl [6:0:1:2]diskHITACHI OPEN-V 6008 /dev/sdm Zoning: - There are two HBAs. Each HBA is plugged into a separate Brocade DCX switch, and each switch is connected to a different fabric (Fabric A and Fabric B respectively) - The storage array presenting the luns is also directly connected to the same switches, in each Fabric via 4 ports (Array ports 1A,3A,6A,8A - Fabric A; Ports 2A,4A,5A,8A - Fabric B). - Each HBA is zoned so as to see two storage array ports (2 in each fabric). Basically this means that when I disable the FC port an HBA is connected to, it loses two paths to the lun. HBA1 - DCX 1 (Fabric A) - 3A,6A HBA2 - DCX 2 (Fabric B) - 4A,5A Blacklisting: Thanks for pointing this out - (note that I only added in the 'SMART' vendor/product after the problem initially exhibited - in an effort to remove the warning regarding a USB flash storage device that appeared in the multipath -ll output. It _shouldn't_ be a factor). Fixed now, initrd updated. Reboot/testing later today (likely in about 3 hours' time). $ echo show config | sudo multipathd -k multipathd show config defaults { user_friendly_names yes } blacklist { devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]* devnode ^hd[a-z] devnode ^dcssblk[0-9]* devnode ^cciss!c[0-9]d[0-9]* wwid 3600605b0039afe20ff54052e7d38 device { vendor SMART product SMART } device { vendor DGC product LUNZ } device { vendor IBM product S/390.* } } blacklist_exceptions { } --snip -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
I note this segfault has appeared in dmesg on the last couple of boots. [ 304.214560] multipathd[3083]: segfault at a ip 7f5eb838798a sp 7fffb225ebb0 error 4 in libc-2.11.1.so[7f5eb831+17a000] After adjusting the blacklist, updating initrd and rebooting, the same behaviour is exhibited when I shut the active FC port: [ 429.234838] qla2xxx :24:00.0: LOOP DOWN detected (2 3 0). [ 459.165399] rport-6:0-1: blocked FC remote port time out: removing target and saving binding [ 459.165875] rport-6:0-0: blocked FC remote port time out: removing target and saving binding [ 459.166496] device-mapper: multipath: Failing path 8:112. [ 459.166573] device-mapper: multipath: Failing path 8:128. [ 459.166765] device-mapper: multipath: Failing path 8:144. [ 459.166831] device-mapper: multipath: Failing path 8:160. [ 459.166914] device-mapper: multipath: Failing path 8:176. [ 459.166965] device-mapper: multipath: Failing path 8:192. [ 459.219341] sd 6:0:1:0: [sdk] Synchronizing SCSI cache [ 459.219374] sd 6:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.220728] device-mapper: multipath: Failing path 8:112. [ 459.299111] sd 6:0:1:1: [sdl] Synchronizing SCSI cache [ 459.299138] sd 6:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.300305] device-mapper: multipath: Failing path 8:128. [ 459.378913] sd 6:0:1:2: [sdm] Synchronizing SCSI cache [ 459.378943] sd 6:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.381678] device-mapper: multipath: Failing path 8:144. [ 459.430852] sd 6:0:0:1: [sdi] Unhandled error code [ 459.430854] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.430857] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00 [ 459.430862] end_request: I/O error, dev sdi, sector 8388480 [ 459.431165] sd 6:0:0:1: [sdi] Unhandled error code [ 459.431166] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.431168] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00 [ 459.431173] end_request: I/O error, dev sdi, sector 8388592 [ 459.431473] sd 6:0:0:1: [sdi] Unhandled error code [ 459.431475] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.431477] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 [ 459.431482] end_request: I/O error, dev sdi, sector 0 [ 459.431767] sd 6:0:0:1: [sdi] Unhandled error code [ 459.431770] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.431772] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00 [ 459.431776] end_request: I/O error, dev sdi, sector 8 [ 459.432072] sd 6:0:0:1: [sdi] Unhandled error code [ 459.432073] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.432075] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 [ 459.432081] end_request: I/O error, dev sdi, sector 0 [ 459.432397] sd 6:0:0:2: [sdj] Unhandled error code [ 459.432399] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.432401] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00 [ 459.432406] end_request: I/O error, dev sdj, sector 8388480 [ 459.432712] sd 6:0:0:2: [sdj] Unhandled error code [ 459.432714] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.432717] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00 [ 459.432723] end_request: I/O error, dev sdj, sector 8388592 [ 459.433034] sd 6:0:0:2: [sdj] Unhandled error code [ 459.433036] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.433039] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 [ 459.433045] end_request: I/O error, dev sdj, sector 0 [ 459.433345] sd 6:0:0:2: [sdj] Unhandled error code [ 459.433347] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.433350] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00 [ 459.433356] end_request: I/O error, dev sdj, sector 8 [ 459.433672] sd 6:0:0:2: [sdj] Unhandled error code [ 459.433674] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.433677] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 [ 459.433684] end_request: I/O error, dev sdj, sector 0 [ 459.456208] sd 6:0:0:0: [sdh] Synchronizing SCSI cache [ 459.456236] sd 6:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.465583] sd 6:0:0:2: [sdj] Unhandled error code [ 459.465585] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.465588] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 [ 459.465593] end_request: I/O error, dev sdj, sector 0 [ 459.548504] sd 6:0:0:1: [sdi] Synchronizing SCSI cache [ 459.548540] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 459.625817] sd 6:0:0:2: [sdj] Synchronizing SCSI cache [
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
** Attachment added: strace_mpathd_110712.log https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3219323/+files/strace_mpathd_110712.log -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
** Attachment added: multipathd_stdout_110712.log https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3219324/+files/multipathd_stdout_110712.log -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
To assist in further debug, stop the multipath daemon, then run it in the foreground with max verbosity, while running it under strace. Log strace to file and the foreground stdout and stderr to another file. Post the results here. Also include your multipath.conf, lvm.conf, output of dmsetup table, and 'echo show config | multipathd -k'. While the system is in the failure mode, in running ps aux do you see a series of kpartx processes in uninterruptable state? I presume you've configured your lvm.conf blacklist correctly? Thanks. https://help.ubuntu.com/12.04/serverguide/multipath-devices.html #multipath-devices-in-logical-volumes ** Changed in: multipath-tools (Ubuntu) Assignee: (unassigned) = Peter Petrakis (peter-petrakis) ** Changed in: multipath-tools (Ubuntu) Status: New = Incomplete -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
Hi Peter, Thanks for picking this up. Note: I ran an 'update-initramfs -u -k all' and rebooted just for good measure before proceeding. There's some output regarding a missing firmware file, I'm not sure it's relevant: root@rgrprod-pmdh-proc-03:/etc# update-initramfs -u -k all update-initramfs: Generating /boot/initrd.img-2.6.32-41-server W: Possible missing firmware /lib/firmware/ql8100_fw.bin for module qla2xxx Oddly, multipathd didn't log a single thing to stderr. ** Attachment added: strace_mpathd.log https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218020/+files/strace_mpathd.log -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
note: multipath_stderr.log exists, but is empty. ** Attachment added: multipathd_stdout.log https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218021/+files/multipathd_stdout.log -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
user@hostname:~$ sudo dmsetup table mpath2: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:48 1000 8:96 1000 8:144 1000 8:192 1000 mpath1: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:128 1000 8:32 1000 8:80 1000 8:176 1000 mpath0: 0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:160 1000 8:16 1000 8:64 1000 8:112 1000 test2-lvol0: 0 8380416 linear 8:192 384 test-lvol0: 0 8380416 linear 8:176 384 vgname2-san: 0 1048567808 linear 8:160 384 vgname-slash: 0 48824320 linear 8:1 384 vgname-swap: 0 9764864 linear 8:1 48824704 user@hostname:~$ sudo echo show config | sudo multipathd -k multipathd show config defaults { user_friendly_names yes } blacklist { devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]* devnode ^hd[a-z] devnode ^dcssblk[0-9]* devnode ^cciss!c[0-9]d[0-9]* wwid 3600605b0039afe20ff54052e7d38 device { vendor DGC product LUNZ } device { vendor IBM product S/390.* } } blacklist_exceptions { } devices { device { vendor APPLE* product Xserve RAID path_grouping_policy multibus } device { vendor 3PARdata product VV path_grouping_policy multibus } device { vendor DEC product HSG80 path_grouping_policy group_by_prio path_checker hp_sw features 1 queue_if_no_path hardware_handler 1 hp_sw prio_callout /sbin/mpath_prio_hp_sw /dev/%n } device { vendor HP product A6189A path_grouping_policy multibus path_checker readsector0 } device { vendor (COMPAQ|HP) product (MSA|HSV)1.0.* path_grouping_policy group_by_prio path_checker hp_sw features 1 queue_if_no_path hardware_handler 1 hp_sw prio_callout /sbin/mpath_prio_hp_sw /dev/%n } device { vendor HP product MSA VOLUME path_grouping_policy group_by_prio path_checker tur prio_callout /sbin/mpath_prio_alua /dev/%n failback immediate } device { vendor (COMPAQ|HP) product (MSA|HSV)1.1.* path_grouping_policy group_by_prio path_checker tur prio_callout /sbin/mpath_prio_alua /dev/%n failback immediate } device { vendor HP product HSV2.* path_grouping_policy group_by_prio path_checker tur prio_callout /sbin/mpath_prio_alua /dev/%n failback immediate } device { vendor HP product LOGICAL VOLUME.* path_grouping_policy multibus getuid_callout /lib/udev/scsi_id -n -g -u -d /dev/%n path_checker tur } device { vendor DDN product SAN DataDirector path_grouping_policy multibus } device { vendor EMC product SYMMETRIX path_grouping_policy multibus getuid_callout /lib/udev/scsi_id -g -u -ppre-spc3-83 -d /dev/%n path_checker readsector0 } device { vendor DGC product .* product_blacklist LUNZ path_grouping_policy group_by_prio path_checker emc_clariion features 1 queue_if_no_path hardware_handler 1 emc prio_callout /sbin/mpath_prio_emc /dev/%n failback immediate no_path_retry 60 } device { vendor FSC product CentricStor path_grouping_policy group_by_serial path_checker readsector0 } device { vendor (HITACHI|HP) product OPEN-.* path_grouping_policy multibus path_checker tur features 1 queue_if_no_path } device { vendor HITACHI product DF.* path_grouping_policy group_by_prio path_checker tur features 1 queue_if_no_path prio_callout /sbin/mpath_prio_hds_modular /dev/%n failback immediate } device { vendor IBM product ProFibre 4000R path_grouping_policy multibus path_checker readsector0 }
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
** Attachment added: multipath.conf https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218022/+files/multipath.conf -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
** Attachment added: lvm.conf https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218023/+files/lvm.conf -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1020436] Re: Cannot read superblock after FC multipath failover
note: /dev/sda is a raid volume used for the root vg; the filter is actually r|/dev/sd[b-z]| -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1020436 Title: Cannot read superblock after FC multipath failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs