[Bug 460906] Re: disk/by-uuid/foo symlink points to snapshot rather than the origin

2012-08-21 Thread Steve Fisher
Possibly users should be warned in some prominent location that if
they're mounting by UUID then leaving an LVM snapshot in place could
result in that being mounted on reboot.

Changing the UUID of the 'real' disk during live operation simply
because it's been snapshotted sounds reasonably insane - there's a case
to be made for that being somewhat immutable after boot.

It might make more sense to filter via mapper and/or by-path (I'm using
by-path in LVM's filter right now to sanely strain the multipath devices
out from regular sd* devices) if you're expecting to have stale LVM
snapshots around at boot time.

A user-configurable choice might make for reasonable compromise.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/460906

Title:
  disk/by-uuid/foo symlink points to snapshot rather than the origin

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/460906/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1030667] [NEW] LVM2 regex exclusion filtering applies incorrectly in some cases

2012-07-29 Thread Steve Fisher
Public bug reported:

lvm.conf: If I add a|.*| into the regex array for the filter, it
ignores my 'sd[b-z]' exclusions and any other items (eg loop, ram) after
the [b-z].

It seems to be an issue with the use of square brackets inside the regex
in certain ways.

Eg if I use the line:

filter = [ a|.*|, r|.*| ]
or
filter = [ a|.*|, r|loop| ]

Then this filters _everything_ in the first example, or just loop
devices respectively... just as you would expect.

However as soon as I use something like:

filter = [ a|.*|, r|loop[0-9]| ]

Then I don't get any filtering at all... except that defining _only_ the
removal filters, per:

filter = [ r|loop[0-9]| ]

.. and other filters of that ilk DO work as expected -- so long as I
remove the accept-all regex.

Even this line works as expected:

filter = [ a|.*|, r|sd.| ]

-- Possibly since placing the the match-any-single-character (.) in
there changes the behaviour.

The filter currently in use works as expected and looks like:

filter = [ r|sd[b-z]|, r|ram[0-9]+|, r|loop[0-9]+|, ]

---

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION=Ubuntu 10.04.3 LTS

---

lvm2:
  Installed: 2.02.54-1ubuntu4.1
  Candidate: 2.02.54-1ubuntu4.1
  Version table:
 *** 2.02.54-1ubuntu4.1 0
500 http://au.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
500 http://security.ubuntu.com/ubuntu/ lucid-security/main Packages
100 /var/lib/dpkg/status
 2.02.54-1ubuntu4 0
500 http://au.archive.ubuntu.com/ubuntu/ lucid/main Packages

** Affects: lvm2 (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: lucid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1030667

Title:
  LVM2 regex exclusion filtering applies incorrectly in some cases

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1030667/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-27 Thread Steve Fisher
The root cause was in fact the lvm.conf filter, but explicitly not for
the reason you'd think.

The issue is that if I added a|.*| into regex array, it was ignoring
my 'sd[b-z]', loop and ram exclusions, both singly and in combination.

It seems to be an obscure issue with the use of square brackets inside
the regex in certain ways.

Eg if I use the line:

filter = [ a|.*|,  r|.*| ]  
or 
filter = [ a|.*|,  r|loop| ]

Then this filters _everything_ in the first example, or just loop
devices respectively... just as you would expect.

However as soon as I use something like:

filter = [ a|.*|, r|loop[0-9]| ]

Then I don't get any filtering at all... except that defining _only_ the
removal filters, per:

filter = [ r|loop[0-9]+| ]

.. and other filters of that ilk DO work as expected -- so long as I
remove the accept-all regex.

I'm not sure if this parsing behaviour is  intended -- it could be to do
with the way the array is encapsulated.

Even this line works as expected:

filter = [ a|.*|,  r|sd.[b-z]| ]

-- Possibly since placing the the match-any-single-character (.) in
there changes the behaviour.

The filter currently in use works as expected and looks like:

filter = [ r|sd[b-z]|, r|ram[0-9]+|, r|loop[0-9]+|, ]

After update-initramfs and reboot, I can successfully disable paths, and
the failover works perfectly.

Your insight is much appreciated, I would never have expected
/dev/mapper/mpath0 to become aliased to a single disk target.

I suspect that what has actually happened is that things were working
perfectly for a time, but then broke upon a reboot of the server.

Ie, due to the lack of filtering,  LVM jumbled the detected volumes at
boot time (due to the way /dev is traversed) such that mpath0 was
actually pointing at a fixed location eg /dev/sdb rather than being
multipathed as expected.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-27 Thread Steve Fisher
The root cause was in fact the lvm.conf filter, but explicitly not for
the reason you'd think.

The issue is that if I added a|.*| into regex array, it was ignoring
my 'sd[b-z]', loop and ram exclusions, both singly and in combination.

It seems to be an obscure issue with the use of square brackets inside
the regex in certain ways.

Eg if I use the line:

filter = [ a|.*|,  r|.*| ]  
or 
filter = [ a|.*|,  r|loop| ]

Then this filters _everything_ in the first example, or just loop
devices respectively... just as you would expect.

However as soon as I use something like:

filter = [ a|.*|, r|loop[0-9]| ]

Then I don't get any filtering at all... except that defining _only_ the
removal filters, per:

filter = [ r|loop[0-9]+| ]

.. and other filters of that ilk DO work as expected -- so long as I
remove the accept-all regex.

I'm not sure if this parsing behaviour is  intended -- it could be to do
with the way the array is encapsulated.

Even this line works as expected:

filter = [ a|.*|,  r|sd.[b-z]| ]

-- Possibly since placing the the match-any-single-character (.) in
there changes the behaviour.

The filter currently in use works as expected and looks like:

filter = [ r|sd[b-z]|, r|ram[0-9]+|, r|loop[0-9]+|, ]

After update-initramfs and reboot, I can successfully disable paths, and
the failover works perfectly.

Your insight is much appreciated, I would never have expected
/dev/mapper/mpath0 to become aliased to a single disk target.

I suspect that what has actually happened is that things were working
perfectly for a time, but then broke upon a reboot of the server.

Ie, due to the lack of filtering,  LVM jumbled the detected volumes at
boot time (due to the way /dev is traversed) such that mpath0 was
actually pointing at a fixed location eg /dev/sdb rather than being
multipathed as expected.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-24 Thread Steve Fisher
Output from before and after commands is attached.

I'm pretty sure you're right about the LVM device filter; I figured that
setting the scsi-timeout to 90 seconds (being way longer than the TUR
path checker, which is scheduled every 20 seconds) should be enough to
handle the device failover.

It's as if dm-4 is using sdk (for example), but when this changes,
either the journal is somehow stuck trying to update via the failed
path, or dm-4 gets removed from the virtual device list because sdk is
no longer there, instead of being reinstated with a new path.

--


(Yes, dm-4 is where the xfs filesystem is (mpath0).  The multipathed dm- 
entries are currently dm-5, dm-6 and dm-7.)


root@hostname:/etc# multipath -ll | grep dm-
mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
mpath1 (360060e80058a07008a07000e) dm-5 HITACHI ,OPEN-V
mpath0 (360060e80058a07008a072417) dm-6 HITACHI ,OPEN-V

root@hostname:/etc# pvscan
  PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
  PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]
  PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
  PV /dev/sda1VG rgrprod-proc03   lvm2 [135.97 GiB / 108.03 GiB 
free]
  Total: 4 [643.96 GiB] / in use: 4 [643.96 GiB] / in no VG: 0 [0   ]

/dev/mapper/test2-lvol0 on /tmp/test2 type xfs (rw)
/dev/mapper/test-lvol0 on /tmp/test type ext4 (rw)
/dev/mapper/mvsan-san on /srv/mysql type xfs (rw)

---

Interestingly, this is before failure:

root@hostname:/etc# find /sys/devices -name *dm-* -print | grep -v virtual | 
sort -t '/' --key=13
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-0
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-1
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:2/block/sdm/holders/dm-2
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-3
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-4
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:1/block/sdc/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:1/block/sdf/holders/dm-5
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:1/block/sdi/holders/dm-5
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:0/block/sdb/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:0/block/sde/holders/dm-6
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:0/block/sdh/holders/dm-6
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdd/holders/dm-7
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:2/block/sdg/holders/dm-7
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:2/block/sdj/holders/dm-7
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:2/block/sdm/holders/dm-7


After failure, that same listing looks like:

root@hostname:/etc# find /sys/devices -name *dm-* -print | grep -v virtual | 
sort -t '/' --key=13
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-0
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-1
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:1/block/sdc/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:1/block/sdf/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:0/block/sdb/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:0/block/sde/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdd/holders/dm-7
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:2/block/sdg/holders/dm-7

Which I suppose is to be expected - except I'm guessing that the
dm-(2,3,4) devices shouldn't disappear - they should map to the new
sdX paths via the physical devices /dev/mapper/mpath(0,1,2) ... or
however that pathing failover is meant to work in LVM.

** Attachment added: before and after tests
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3233696/+files/240712_1445.tgz

-- 
You received this bug 

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-24 Thread Steve Fisher
Output from before and after commands is attached.

I'm pretty sure you're right about the LVM device filter; I figured that
setting the scsi-timeout to 90 seconds (being way longer than the TUR
path checker, which is scheduled every 20 seconds) should be enough to
handle the device failover.

It's as if dm-4 is using sdk (for example), but when this changes,
either the journal is somehow stuck trying to update via the failed
path, or dm-4 gets removed from the virtual device list because sdk is
no longer there, instead of being reinstated with a new path.

--


(Yes, dm-4 is where the xfs filesystem is (mpath0).  The multipathed dm- 
entries are currently dm-5, dm-6 and dm-7.)


root@hostname:/etc# multipath -ll | grep dm-
mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
mpath1 (360060e80058a07008a07000e) dm-5 HITACHI ,OPEN-V
mpath0 (360060e80058a07008a072417) dm-6 HITACHI ,OPEN-V

root@hostname:/etc# pvscan
  PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
  PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]
  PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
  PV /dev/sda1VG rgrprod-proc03   lvm2 [135.97 GiB / 108.03 GiB 
free]
  Total: 4 [643.96 GiB] / in use: 4 [643.96 GiB] / in no VG: 0 [0   ]

/dev/mapper/test2-lvol0 on /tmp/test2 type xfs (rw)
/dev/mapper/test-lvol0 on /tmp/test type ext4 (rw)
/dev/mapper/mvsan-san on /srv/mysql type xfs (rw)

---

Interestingly, this is before failure:

root@hostname:/etc# find /sys/devices -name *dm-* -print | grep -v virtual | 
sort -t '/' --key=13
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-0
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-1
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:2/block/sdm/holders/dm-2
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-3
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-4
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:1/block/sdc/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:1/block/sdf/holders/dm-5
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:1/block/sdi/holders/dm-5
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:1/block/sdl/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:0/block/sdb/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:0/block/sde/holders/dm-6
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:0/block/sdh/holders/dm-6
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:0/block/sdk/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdd/holders/dm-7
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:2/block/sdg/holders/dm-7
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-0/target6:0:0/6:0:0:2/block/sdj/holders/dm-7
/sys/devices/pci:00/:00:09.0/:24:00.0/host6/rport-6:0-1/target6:0:1/6:0:1:2/block/sdm/holders/dm-7


After failure, that same listing looks like:

root@hostname:/etc# find /sys/devices -name *dm-* -print | grep -v virtual | 
sort -t '/' --key=13
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-0
/sys/devices/pci:00/:00:1c.0/:01:00.0/host4/target4:2:0/4:2:0:0/block/sda/sda1/holders/dm-1
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:1/block/sdc/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:1/block/sdf/holders/dm-5
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:0/block/sdb/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:0/block/sde/holders/dm-6
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-0/target5:0:0/5:0:0:2/block/sdd/holders/dm-7
/sys/devices/pci:00/:00:07.0/:1f:00.0/host5/rport-5:0-1/target5:0:1/5:0:1:2/block/sdg/holders/dm-7

Which I suppose is to be expected - except I'm guessing that the
dm-(2,3,4) devices shouldn't disappear - they should map to the new
sdX paths via the physical devices /dev/mapper/mpath(0,1,2) ... or
however that pathing failover is meant to work in LVM.

** Attachment added: before and after tests
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3233696/+files/240712_1445.tgz

-- 
You received this bug 

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-23 Thread Steve Fisher
It looks like all the /dev/mapper/mpathX targets, and the remaining 'sd'
unique paths (with one fabric disabled) were all readable after the path
shutdown, but the underlying LVs were somehow still deactivated when the
paths disappeared.

root@hostname-03:/srv/mysql# dmesg
snip
[1039812.298161] I/O error in filesystem (dm-4) meta-data dev dm-4 block 
0x1f414af0   (xlog_iodone) error 5 buf count 3072
[1039812.311419] xfs_force_shutdown(dm-4,0x2) called from line 1043 of file 
/build/buildd/linux-2.6.32/fs/xfs/xfs_log.c.  Return address = 
0xa0162516
[1039812.311433] Filesystem dm-4: Log I/O Error Detected.  Shutting down 
filesystem: dm-4
[1039812.326038] Please umount the filesystem, and rectify the problem(s)

root@hostname:/srv/mysql# ls -l
ls: cannot open directory .: Input/output error

root@hostname:/srv/mysql#  lvs -o lv_attr
  /dev/test2/lvol0: read failed after 0 of 2048 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mvsan/san: read failed after 0 of 2048 at 0: Input/output error
  Attr  
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath
mpath0  mpath1  mpath2  
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath0 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.0191405 s, 267 kB/s
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath1 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00693051 s, 739 kB/s
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath2 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00559024 s, 916 kB/s

-

root@hostname:/srv/mysql# ls -l /dev/sd*
brw-rw 1 root disk 8,  0 2012-07-11 14:24 /dev/sda
brw-rw 1 root disk 8,  1 2012-07-11 14:24 /dev/sda1
brw-rw 1 root disk 8, 16 2012-07-11 14:24 /dev/sdb
brw-rw 1 root disk 8, 32 2012-07-11 14:24 /dev/sdc
brw-rw 1 root disk 8, 48 2012-07-11 14:24 /dev/sdd
brw-rw 1 root disk 8, 64 2012-07-11 14:24 /dev/sde
brw-rw 1 root disk 8, 80 2012-07-11 14:24 /dev/sdf
brw-rw 1 root disk 8, 96 2012-07-11 14:24 /dev/sdg
root@hostname:/srv/mysql# dd if=/dev/sdb of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 5.0139e-05 s, 102 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdc of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0726e-05 s, 126 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdd of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0935e-05 s, 125 MB/s
root@hostname:/srv/mysql# dd if=/dev/sde of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.2647e-05 s, 120 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdf of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.1718e-05 s, 123 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdg of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.3425e-05 s, 118 MB/s

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-23 Thread Steve Fisher
It looks like all the /dev/mapper/mpathX targets, and the remaining 'sd'
unique paths (with one fabric disabled) were all readable after the path
shutdown, but the underlying LVs were somehow still deactivated when the
paths disappeared.

root@hostname-03:/srv/mysql# dmesg
snip
[1039812.298161] I/O error in filesystem (dm-4) meta-data dev dm-4 block 
0x1f414af0   (xlog_iodone) error 5 buf count 3072
[1039812.311419] xfs_force_shutdown(dm-4,0x2) called from line 1043 of file 
/build/buildd/linux-2.6.32/fs/xfs/xfs_log.c.  Return address = 
0xa0162516
[1039812.311433] Filesystem dm-4: Log I/O Error Detected.  Shutting down 
filesystem: dm-4
[1039812.326038] Please umount the filesystem, and rectify the problem(s)

root@hostname:/srv/mysql# ls -l
ls: cannot open directory .: Input/output error

root@hostname:/srv/mysql#  lvs -o lv_attr
  /dev/test2/lvol0: read failed after 0 of 2048 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/mvsan/san: read failed after 0 of 2048 at 0: Input/output error
  Attr  
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
  -wi-ao
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath
mpath0  mpath1  mpath2  
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath0 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.0191405 s, 267 kB/s
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath1 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00693051 s, 739 kB/s
root@hostname:/srv/mysql# dd if=/dev/mapper/mpath2 of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00559024 s, 916 kB/s

-

root@hostname:/srv/mysql# ls -l /dev/sd*
brw-rw 1 root disk 8,  0 2012-07-11 14:24 /dev/sda
brw-rw 1 root disk 8,  1 2012-07-11 14:24 /dev/sda1
brw-rw 1 root disk 8, 16 2012-07-11 14:24 /dev/sdb
brw-rw 1 root disk 8, 32 2012-07-11 14:24 /dev/sdc
brw-rw 1 root disk 8, 48 2012-07-11 14:24 /dev/sdd
brw-rw 1 root disk 8, 64 2012-07-11 14:24 /dev/sde
brw-rw 1 root disk 8, 80 2012-07-11 14:24 /dev/sdf
brw-rw 1 root disk 8, 96 2012-07-11 14:24 /dev/sdg
root@hostname:/srv/mysql# dd if=/dev/sdb of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 5.0139e-05 s, 102 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdc of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0726e-05 s, 126 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdd of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.0935e-05 s, 125 MB/s
root@hostname:/srv/mysql# dd if=/dev/sde of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.2647e-05 s, 120 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdf of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.1718e-05 s, 123 MB/s
root@hostname:/srv/mysql# dd if=/dev/sdg of=/dev/null count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 4.3425e-05 s, 118 MB/s

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-16 Thread Steve Fisher
To date, the issue hasn't been observed on the two physical hosts we
have running 10.04.1 LTS with the same multipath-tools version, which
certainly raises a flag.

They are being used as mission critical / production database servers so
I'm scheduling a window in which we will be able to confirm whether or
not it is manifesting on those hosts too.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-16 Thread Steve Fisher
To date, the issue hasn't been observed on the two physical hosts we
have running 10.04.1 LTS with the same multipath-tools version, which
certainly raises a flag.

They are being used as mission critical / production database servers so
I'm scheduling a window in which we will be able to confirm whether or
not it is manifesting on those hosts too.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
** Attachment removed: strace_mpathd.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218020/+files/strace_mpathd.log

** Attachment added: strace_mpathd.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218069/+files/strace_mpathd.log

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
-- lvm.conf blacklist should be fine, searches /dev but filters all
/dev/sd.*

I'll attach those files momentarily.

I didn't see any kpartx processes running at all; there wasn't anything
in an uninterruptable sleep state ('D') either.

Listing all kernel and some additional processes here (removed stuff
like apache / php processes and certain publicly identifiable stuff
that's not relevant)

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 1  0.6  0.0  23828  2064 ?Ss   14:59   0:04 /sbin/init
root 2  0.0  0.0  0 0 ?S14:59   0:00 [kthreadd]
root 3  0.0  0.0  0 0 ?S14:59   0:00 [migration/0]
root 4  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/0]
root 5  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/0]
root 6  0.0  0.0  0 0 ?S14:59   0:00 [migration/1]
root 7  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/1]
root 8  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/1]
root 9  0.0  0.0  0 0 ?S14:59   0:00 [migration/2]
root10  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/2]
root11  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/2]
root12  0.0  0.0  0 0 ?S14:59   0:00 [migration/3]
root13  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/3]
root14  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/3]
root15  0.0  0.0  0 0 ?S14:59   0:00 [migration/4]
root16  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/4]
root17  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/4]
root18  0.0  0.0  0 0 ?S14:59   0:00 [migration/5]
root19  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/5]
root20  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/5]
root21  0.0  0.0  0 0 ?S14:59   0:00 [migration/6]
root22  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/6]
root23  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/6]
root24  0.0  0.0  0 0 ?S14:59   0:00 [migration/7]
root25  0.1  0.0  0 0 ?S14:59   0:01 [ksoftirqd/7]
root26  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/7]
root27  0.0  0.0  0 0 ?S14:59   0:00 [migration/8]
root28  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/8]
root29  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/8]
root30  0.0  0.0  0 0 ?S14:59   0:00 [migration/9]
root31  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/9]
root32  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/9]
root33  0.0  0.0  0 0 ?S14:59   0:00 [migration/10]
root34  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/10]
root35  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/10]
root36  0.0  0.0  0 0 ?S14:59   0:00 [migration/11]
root37  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/11]
root38  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/11]
root39  0.0  0.0  0 0 ?S14:59   0:00 [migration/12]
root40  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/12]
root41  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/12]
root42  0.0  0.0  0 0 ?S14:59   0:00 [migration/13]
root43  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/13]
root44  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/13]
root45  0.0  0.0  0 0 ?S14:59   0:00 [migration/14]
root46  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/14]
root47  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/14]
root48  0.0  0.0  0 0 ?S14:59   0:00 [migration/15]
root49  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/15]
root50  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/15]
root51  0.0  0.0  0 0 ?S14:59   0:00 [migration/16]
root52  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/16]
root53  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/16]
root54  0.0  0.0  0 0 ?S14:59   0:00 [migration/17]
root55  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/17]
root56  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/17]
root57  0.0  0.0  0 0 ?S14:59   0:00 [migration/18]
root58  0.0  0.0 

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
Storage devices:

$ sudo lsscsi
[0:0:0:0]cd/dvd  TSSTcorp CDDVDW TS-L633F  IT03  /dev/sr0
[4:2:0:0]diskIBM  ServeRAID M5015  2.12  /dev/sda
[5:0:0:0]diskHITACHI  OPEN-V   6008  /dev/sdb
[5:0:0:1]diskHITACHI  OPEN-V   6008  /dev/sdc
[5:0:0:2]diskHITACHI  OPEN-V   6008  /dev/sdd
[5:0:1:0]diskHITACHI  OPEN-V   6008  /dev/sde
[5:0:1:1]diskHITACHI  OPEN-V   6008  /dev/sdf
[5:0:1:2]diskHITACHI  OPEN-V   6008  /dev/sdg
[6:0:0:0]diskHITACHI  OPEN-V   6008  /dev/sdh
[6:0:0:1]diskHITACHI  OPEN-V   6008  /dev/sdi
[6:0:0:2]diskHITACHI  OPEN-V   6008  /dev/sdj
[6:0:1:0]diskHITACHI  OPEN-V   6008  /dev/sdk
[6:0:1:1]diskHITACHI  OPEN-V   6008  /dev/sdl
[6:0:1:2]diskHITACHI  OPEN-V   6008  /dev/sdm

Zoning:

-  There are two HBAs.  Each HBA is plugged into a separate Brocade DCX switch, 
and each switch is connected to a different fabric (Fabric A and Fabric B 
respectively) 
-  The storage array presenting the luns is also directly connected to the same 
switches, in each Fabric via 4 ports (Array ports 1A,3A,6A,8A - Fabric A; 
Ports 2A,4A,5A,8A - Fabric B).
- Each HBA is zoned so as to see two storage array ports (2 in each fabric).  
Basically this means that when I disable the FC port an HBA is connected to, it 
loses two paths to the lun.

HBA1 - DCX 1 (Fabric A) - 3A,6A
HBA2 - DCX 2 (Fabric B) - 4A,5A

Blacklisting:

Thanks for pointing this out - (note that I only added in the 'SMART'
vendor/product after the problem initially exhibited - in an effort to
remove the warning regarding a USB flash storage device that appeared in
the multipath -ll output.  It _shouldn't_ be a factor).

Fixed now, initrd updated.  Reboot/testing later today (likely in about
3 hours' time).

$ echo show config | sudo multipathd -k
multipathd show config
defaults {
user_friendly_names yes
}
blacklist {
devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
devnode ^hd[a-z]
devnode ^dcssblk[0-9]*
devnode ^cciss!c[0-9]d[0-9]*
wwid 3600605b0039afe20ff54052e7d38
device {
vendor SMART
product SMART
}
device {
vendor DGC
product LUNZ
}
device {
vendor IBM
product S/390.*
}
}
blacklist_exceptions {
}
--snip

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
I note this segfault has appeared in dmesg on the last couple of boots.

[  304.214560] multipathd[3083]: segfault at a ip 7f5eb838798a sp
7fffb225ebb0 error 4 in libc-2.11.1.so[7f5eb831+17a000]

After adjusting the blacklist, updating initrd and rebooting, the same
behaviour is exhibited when I shut the active FC port:

[  429.234838] qla2xxx :24:00.0: LOOP DOWN detected (2 3 0).
[  459.165399]  rport-6:0-1: blocked FC remote port time out: removing target 
and saving binding
[  459.165875]  rport-6:0-0: blocked FC remote port time out: removing target 
and saving binding
[  459.166496] device-mapper: multipath: Failing path 8:112.
[  459.166573] device-mapper: multipath: Failing path 8:128.
[  459.166765] device-mapper: multipath: Failing path 8:144.
[  459.166831] device-mapper: multipath: Failing path 8:160.
[  459.166914] device-mapper: multipath: Failing path 8:176.
[  459.166965] device-mapper: multipath: Failing path 8:192.
[  459.219341] sd 6:0:1:0: [sdk] Synchronizing SCSI cache
[  459.219374] sd 6:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.220728] device-mapper: multipath: Failing path 8:112.
[  459.299111] sd 6:0:1:1: [sdl] Synchronizing SCSI cache
[  459.299138] sd 6:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.300305] device-mapper: multipath: Failing path 8:128.
[  459.378913] sd 6:0:1:2: [sdm] Synchronizing SCSI cache
[  459.378943] sd 6:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.381678] device-mapper: multipath: Failing path 8:144.
[  459.430852] sd 6:0:0:1: [sdi] Unhandled error code
[  459.430854] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.430857] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[  459.430862] end_request: I/O error, dev sdi, sector 8388480
[  459.431165] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431166] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.431168] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[  459.431173] end_request: I/O error, dev sdi, sector 8388592
[  459.431473] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431475] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.431477] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.431482] end_request: I/O error, dev sdi, sector 0
[  459.431767] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431770] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.431772] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[  459.431776] end_request: I/O error, dev sdi, sector 8
[  459.432072] sd 6:0:0:1: [sdi] Unhandled error code
[  459.432073] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.432075] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.432081] end_request: I/O error, dev sdi, sector 0
[  459.432397] sd 6:0:0:2: [sdj] Unhandled error code
[  459.432399] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.432401] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[  459.432406] end_request: I/O error, dev sdj, sector 8388480
[  459.432712] sd 6:0:0:2: [sdj] Unhandled error code
[  459.432714] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.432717] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[  459.432723] end_request: I/O error, dev sdj, sector 8388592
[  459.433034] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433036] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.433039] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.433045] end_request: I/O error, dev sdj, sector 0
[  459.433345] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433347] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.433350] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[  459.433356] end_request: I/O error, dev sdj, sector 8
[  459.433672] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433674] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.433677] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.433684] end_request: I/O error, dev sdj, sector 0
[  459.456208] sd 6:0:0:0: [sdh] Synchronizing SCSI cache
[  459.456236] sd 6:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.465583] sd 6:0:0:2: [sdj] Unhandled error code
[  459.465585] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.465588] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.465593] end_request: I/O error, dev sdj, sector 0
[  459.548504] sd 6:0:0:1: [sdi] Synchronizing SCSI cache
[  459.548540] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.625817] sd 6:0:0:2: [sdj] Synchronizing SCSI cache
[  

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
** Attachment added: strace_mpathd_110712.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3219323/+files/strace_mpathd_110712.log

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
** Attachment added: multipathd_stdout_110712.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3219324/+files/multipathd_stdout_110712.log

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
** Attachment removed: strace_mpathd.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218020/+files/strace_mpathd.log

** Attachment added: strace_mpathd.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218069/+files/strace_mpathd.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
Storage devices:

$ sudo lsscsi
[0:0:0:0]cd/dvd  TSSTcorp CDDVDW TS-L633F  IT03  /dev/sr0
[4:2:0:0]diskIBM  ServeRAID M5015  2.12  /dev/sda
[5:0:0:0]diskHITACHI  OPEN-V   6008  /dev/sdb
[5:0:0:1]diskHITACHI  OPEN-V   6008  /dev/sdc
[5:0:0:2]diskHITACHI  OPEN-V   6008  /dev/sdd
[5:0:1:0]diskHITACHI  OPEN-V   6008  /dev/sde
[5:0:1:1]diskHITACHI  OPEN-V   6008  /dev/sdf
[5:0:1:2]diskHITACHI  OPEN-V   6008  /dev/sdg
[6:0:0:0]diskHITACHI  OPEN-V   6008  /dev/sdh
[6:0:0:1]diskHITACHI  OPEN-V   6008  /dev/sdi
[6:0:0:2]diskHITACHI  OPEN-V   6008  /dev/sdj
[6:0:1:0]diskHITACHI  OPEN-V   6008  /dev/sdk
[6:0:1:1]diskHITACHI  OPEN-V   6008  /dev/sdl
[6:0:1:2]diskHITACHI  OPEN-V   6008  /dev/sdm

Zoning:

-  There are two HBAs.  Each HBA is plugged into a separate Brocade DCX switch, 
and each switch is connected to a different fabric (Fabric A and Fabric B 
respectively) 
-  The storage array presenting the luns is also directly connected to the same 
switches, in each Fabric via 4 ports (Array ports 1A,3A,6A,8A - Fabric A; 
Ports 2A,4A,5A,8A - Fabric B).
- Each HBA is zoned so as to see two storage array ports (2 in each fabric).  
Basically this means that when I disable the FC port an HBA is connected to, it 
loses two paths to the lun.

HBA1 - DCX 1 (Fabric A) - 3A,6A
HBA2 - DCX 2 (Fabric B) - 4A,5A

Blacklisting:

Thanks for pointing this out - (note that I only added in the 'SMART'
vendor/product after the problem initially exhibited - in an effort to
remove the warning regarding a USB flash storage device that appeared in
the multipath -ll output.  It _shouldn't_ be a factor).

Fixed now, initrd updated.  Reboot/testing later today (likely in about
3 hours' time).

$ echo show config | sudo multipathd -k
multipathd show config
defaults {
user_friendly_names yes
}
blacklist {
devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
devnode ^hd[a-z]
devnode ^dcssblk[0-9]*
devnode ^cciss!c[0-9]d[0-9]*
wwid 3600605b0039afe20ff54052e7d38
device {
vendor SMART
product SMART
}
device {
vendor DGC
product LUNZ
}
device {
vendor IBM
product S/390.*
}
}
blacklist_exceptions {
}
--snip

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
I note this segfault has appeared in dmesg on the last couple of boots.

[  304.214560] multipathd[3083]: segfault at a ip 7f5eb838798a sp
7fffb225ebb0 error 4 in libc-2.11.1.so[7f5eb831+17a000]

After adjusting the blacklist, updating initrd and rebooting, the same
behaviour is exhibited when I shut the active FC port:

[  429.234838] qla2xxx :24:00.0: LOOP DOWN detected (2 3 0).
[  459.165399]  rport-6:0-1: blocked FC remote port time out: removing target 
and saving binding
[  459.165875]  rport-6:0-0: blocked FC remote port time out: removing target 
and saving binding
[  459.166496] device-mapper: multipath: Failing path 8:112.
[  459.166573] device-mapper: multipath: Failing path 8:128.
[  459.166765] device-mapper: multipath: Failing path 8:144.
[  459.166831] device-mapper: multipath: Failing path 8:160.
[  459.166914] device-mapper: multipath: Failing path 8:176.
[  459.166965] device-mapper: multipath: Failing path 8:192.
[  459.219341] sd 6:0:1:0: [sdk] Synchronizing SCSI cache
[  459.219374] sd 6:0:1:0: [sdk] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.220728] device-mapper: multipath: Failing path 8:112.
[  459.299111] sd 6:0:1:1: [sdl] Synchronizing SCSI cache
[  459.299138] sd 6:0:1:1: [sdl] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.300305] device-mapper: multipath: Failing path 8:128.
[  459.378913] sd 6:0:1:2: [sdm] Synchronizing SCSI cache
[  459.378943] sd 6:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.381678] device-mapper: multipath: Failing path 8:144.
[  459.430852] sd 6:0:0:1: [sdi] Unhandled error code
[  459.430854] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.430857] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[  459.430862] end_request: I/O error, dev sdi, sector 8388480
[  459.431165] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431166] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.431168] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[  459.431173] end_request: I/O error, dev sdi, sector 8388592
[  459.431473] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431475] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.431477] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.431482] end_request: I/O error, dev sdi, sector 0
[  459.431767] sd 6:0:0:1: [sdi] Unhandled error code
[  459.431770] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.431772] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[  459.431776] end_request: I/O error, dev sdi, sector 8
[  459.432072] sd 6:0:0:1: [sdi] Unhandled error code
[  459.432073] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.432075] sd 6:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.432081] end_request: I/O error, dev sdi, sector 0
[  459.432397] sd 6:0:0:2: [sdj] Unhandled error code
[  459.432399] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.432401] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff 80 00 00 08 00
[  459.432406] end_request: I/O error, dev sdj, sector 8388480
[  459.432712] sd 6:0:0:2: [sdj] Unhandled error code
[  459.432714] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.432717] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 7f ff f0 00 00 08 00
[  459.432723] end_request: I/O error, dev sdj, sector 8388592
[  459.433034] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433036] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.433039] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.433045] end_request: I/O error, dev sdj, sector 0
[  459.433345] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433347] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.433350] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
[  459.433356] end_request: I/O error, dev sdj, sector 8
[  459.433672] sd 6:0:0:2: [sdj] Unhandled error code
[  459.433674] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.433677] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.433684] end_request: I/O error, dev sdj, sector 0
[  459.456208] sd 6:0:0:0: [sdh] Synchronizing SCSI cache
[  459.456236] sd 6:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.465583] sd 6:0:0:2: [sdj] Unhandled error code
[  459.465585] sd 6:0:0:2: [sdj] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.465588] sd 6:0:0:2: [sdj] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  459.465593] end_request: I/O error, dev sdj, sector 0
[  459.548504] sd 6:0:0:1: [sdi] Synchronizing SCSI cache
[  459.548540] sd 6:0:0:1: [sdi] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[  459.625817] sd 6:0:0:2: [sdj] Synchronizing SCSI cache
[  

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
** Attachment added: strace_mpathd_110712.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3219323/+files/strace_mpathd_110712.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-10 Thread Steve Fisher
** Attachment added: multipathd_stdout_110712.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3219324/+files/multipathd_stdout_110712.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
Hi Peter,

Thanks for picking this up.

Note: I ran an 'update-initramfs -u -k all' and rebooted just for good
measure before proceeding.  There's some output regarding a missing
firmware file, I'm not sure it's relevant:

root@rgrprod-pmdh-proc-03:/etc# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-2.6.32-41-server
W: Possible missing firmware /lib/firmware/ql8100_fw.bin for module qla2xxx

Oddly, multipathd didn't log a single thing to stderr.


** Attachment added: strace_mpathd.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218020/+files/strace_mpathd.log

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
note: multipath_stderr.log exists, but is empty.

** Attachment added: multipathd_stdout.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218021/+files/multipathd_stdout.log

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
user@hostname:~$ sudo dmsetup table
mpath2: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:48 
1000 8:96 1000 8:144 1000 8:192 1000 
mpath1: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:128 
1000 8:32 1000 8:80 1000 8:176 1000 
mpath0: 0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:160 
1000 8:16 1000 8:64 1000 8:112 1000 
test2-lvol0: 0 8380416 linear 8:192 384
test-lvol0: 0 8380416 linear 8:176 384
vgname2-san: 0 1048567808 linear 8:160 384
vgname-slash: 0 48824320 linear 8:1 384
vgname-swap: 0 9764864 linear 8:1 48824704



user@hostname:~$ sudo echo show config | sudo multipathd -k
multipathd show config
defaults {
user_friendly_names yes
}
blacklist {
devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
devnode ^hd[a-z]
devnode ^dcssblk[0-9]*
devnode ^cciss!c[0-9]d[0-9]*
wwid 3600605b0039afe20ff54052e7d38
device {
vendor DGC
product LUNZ
}
device {
vendor IBM
product S/390.*
}
}
blacklist_exceptions {
}
devices {
device {
vendor APPLE*
product Xserve RAID 
path_grouping_policy multibus
}
device {
vendor 3PARdata
product VV
path_grouping_policy multibus
}
device {
vendor DEC
product HSG80
path_grouping_policy group_by_prio
path_checker hp_sw
features 1 queue_if_no_path
hardware_handler 1 hp_sw
prio_callout /sbin/mpath_prio_hp_sw /dev/%n
}
device {
vendor HP
product A6189A
path_grouping_policy multibus
path_checker readsector0
}
device {
vendor (COMPAQ|HP)
product (MSA|HSV)1.0.*
path_grouping_policy group_by_prio
path_checker hp_sw
features 1 queue_if_no_path
hardware_handler 1 hp_sw
prio_callout /sbin/mpath_prio_hp_sw /dev/%n
}
device {
vendor HP
product MSA VOLUME
path_grouping_policy group_by_prio
path_checker tur
prio_callout /sbin/mpath_prio_alua /dev/%n
failback immediate
}
device {
vendor (COMPAQ|HP)
product (MSA|HSV)1.1.*
path_grouping_policy group_by_prio
path_checker tur
prio_callout /sbin/mpath_prio_alua /dev/%n
failback immediate
}
device {
vendor HP
product HSV2.*
path_grouping_policy group_by_prio
path_checker tur
prio_callout /sbin/mpath_prio_alua /dev/%n
failback immediate
}
device {
vendor HP
product LOGICAL VOLUME.*
path_grouping_policy multibus
getuid_callout /lib/udev/scsi_id -n -g -u -d /dev/%n
path_checker tur
}
device {
vendor DDN
product SAN DataDirector
path_grouping_policy multibus
}
device {
vendor EMC
product SYMMETRIX
path_grouping_policy multibus
getuid_callout /lib/udev/scsi_id -g -u -ppre-spc3-83 -d /dev/%n
path_checker readsector0
}
device {
vendor DGC
product .*
product_blacklist LUNZ
path_grouping_policy group_by_prio
path_checker emc_clariion
features 1 queue_if_no_path
hardware_handler 1 emc
prio_callout /sbin/mpath_prio_emc /dev/%n
failback immediate
no_path_retry 60
}
device {
vendor FSC
product CentricStor
path_grouping_policy group_by_serial
path_checker readsector0
}
device {
vendor (HITACHI|HP)
product OPEN-.*
path_grouping_policy multibus
path_checker tur
features 1 queue_if_no_path
}
device {
vendor HITACHI
product DF.*
path_grouping_policy group_by_prio
path_checker tur
features 1 queue_if_no_path
prio_callout /sbin/mpath_prio_hds_modular /dev/%n
failback immediate
}
device {
vendor IBM
product ProFibre 4000R
path_grouping_policy multibus
path_checker readsector0
}

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
** Attachment added: multipath.conf
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218022/+files/multipath.conf

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
** Attachment added: lvm.conf
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218023/+files/lvm.conf

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
note: /dev/sda is a raid volume used for the root vg; the filter is
actually r|/dev/sd[b-z]|

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
Hi Peter,

Thanks for picking this up.

Note: I ran an 'update-initramfs -u -k all' and rebooted just for good
measure before proceeding.  There's some output regarding a missing
firmware file, I'm not sure it's relevant:

root@rgrprod-pmdh-proc-03:/etc# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-2.6.32-41-server
W: Possible missing firmware /lib/firmware/ql8100_fw.bin for module qla2xxx

Oddly, multipathd didn't log a single thing to stderr.


** Attachment added: strace_mpathd.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218020/+files/strace_mpathd.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
note: multipath_stderr.log exists, but is empty.

** Attachment added: multipathd_stdout.log
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218021/+files/multipathd_stdout.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
user@hostname:~$ sudo dmsetup table
mpath2: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:48 
1000 8:96 1000 8:144 1000 8:192 1000 
mpath1: 0 8388608 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:128 
1000 8:32 1000 8:80 1000 8:176 1000 
mpath0: 0 1048576000 multipath 1 queue_if_no_path 0 1 1 round-robin 0 4 1 8:160 
1000 8:16 1000 8:64 1000 8:112 1000 
test2-lvol0: 0 8380416 linear 8:192 384
test-lvol0: 0 8380416 linear 8:176 384
vgname2-san: 0 1048567808 linear 8:160 384
vgname-slash: 0 48824320 linear 8:1 384
vgname-swap: 0 9764864 linear 8:1 48824704



user@hostname:~$ sudo echo show config | sudo multipathd -k
multipathd show config
defaults {
user_friendly_names yes
}
blacklist {
devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
devnode ^hd[a-z]
devnode ^dcssblk[0-9]*
devnode ^cciss!c[0-9]d[0-9]*
wwid 3600605b0039afe20ff54052e7d38
device {
vendor DGC
product LUNZ
}
device {
vendor IBM
product S/390.*
}
}
blacklist_exceptions {
}
devices {
device {
vendor APPLE*
product Xserve RAID 
path_grouping_policy multibus
}
device {
vendor 3PARdata
product VV
path_grouping_policy multibus
}
device {
vendor DEC
product HSG80
path_grouping_policy group_by_prio
path_checker hp_sw
features 1 queue_if_no_path
hardware_handler 1 hp_sw
prio_callout /sbin/mpath_prio_hp_sw /dev/%n
}
device {
vendor HP
product A6189A
path_grouping_policy multibus
path_checker readsector0
}
device {
vendor (COMPAQ|HP)
product (MSA|HSV)1.0.*
path_grouping_policy group_by_prio
path_checker hp_sw
features 1 queue_if_no_path
hardware_handler 1 hp_sw
prio_callout /sbin/mpath_prio_hp_sw /dev/%n
}
device {
vendor HP
product MSA VOLUME
path_grouping_policy group_by_prio
path_checker tur
prio_callout /sbin/mpath_prio_alua /dev/%n
failback immediate
}
device {
vendor (COMPAQ|HP)
product (MSA|HSV)1.1.*
path_grouping_policy group_by_prio
path_checker tur
prio_callout /sbin/mpath_prio_alua /dev/%n
failback immediate
}
device {
vendor HP
product HSV2.*
path_grouping_policy group_by_prio
path_checker tur
prio_callout /sbin/mpath_prio_alua /dev/%n
failback immediate
}
device {
vendor HP
product LOGICAL VOLUME.*
path_grouping_policy multibus
getuid_callout /lib/udev/scsi_id -n -g -u -d /dev/%n
path_checker tur
}
device {
vendor DDN
product SAN DataDirector
path_grouping_policy multibus
}
device {
vendor EMC
product SYMMETRIX
path_grouping_policy multibus
getuid_callout /lib/udev/scsi_id -g -u -ppre-spc3-83 -d /dev/%n
path_checker readsector0
}
device {
vendor DGC
product .*
product_blacklist LUNZ
path_grouping_policy group_by_prio
path_checker emc_clariion
features 1 queue_if_no_path
hardware_handler 1 emc
prio_callout /sbin/mpath_prio_emc /dev/%n
failback immediate
no_path_retry 60
}
device {
vendor FSC
product CentricStor
path_grouping_policy group_by_serial
path_checker readsector0
}
device {
vendor (HITACHI|HP)
product OPEN-.*
path_grouping_policy multibus
path_checker tur
features 1 queue_if_no_path
}
device {
vendor HITACHI
product DF.*
path_grouping_policy group_by_prio
path_checker tur
features 1 queue_if_no_path
prio_callout /sbin/mpath_prio_hds_modular /dev/%n
failback immediate
}
device {
vendor IBM
product ProFibre 4000R
path_grouping_policy multibus
path_checker readsector0
}

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
-- lvm.conf blacklist should be fine, searches /dev but filters all
/dev/sd.*

I'll attach those files momentarily.

I didn't see any kpartx processes running at all; there wasn't anything
in an uninterruptable sleep state ('D') either.

Listing all kernel and some additional processes here (removed stuff
like apache / php processes and certain publicly identifiable stuff
that's not relevant)

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 1  0.6  0.0  23828  2064 ?Ss   14:59   0:04 /sbin/init
root 2  0.0  0.0  0 0 ?S14:59   0:00 [kthreadd]
root 3  0.0  0.0  0 0 ?S14:59   0:00 [migration/0]
root 4  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/0]
root 5  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/0]
root 6  0.0  0.0  0 0 ?S14:59   0:00 [migration/1]
root 7  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/1]
root 8  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/1]
root 9  0.0  0.0  0 0 ?S14:59   0:00 [migration/2]
root10  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/2]
root11  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/2]
root12  0.0  0.0  0 0 ?S14:59   0:00 [migration/3]
root13  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/3]
root14  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/3]
root15  0.0  0.0  0 0 ?S14:59   0:00 [migration/4]
root16  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/4]
root17  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/4]
root18  0.0  0.0  0 0 ?S14:59   0:00 [migration/5]
root19  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/5]
root20  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/5]
root21  0.0  0.0  0 0 ?S14:59   0:00 [migration/6]
root22  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/6]
root23  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/6]
root24  0.0  0.0  0 0 ?S14:59   0:00 [migration/7]
root25  0.1  0.0  0 0 ?S14:59   0:01 [ksoftirqd/7]
root26  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/7]
root27  0.0  0.0  0 0 ?S14:59   0:00 [migration/8]
root28  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/8]
root29  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/8]
root30  0.0  0.0  0 0 ?S14:59   0:00 [migration/9]
root31  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/9]
root32  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/9]
root33  0.0  0.0  0 0 ?S14:59   0:00 [migration/10]
root34  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/10]
root35  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/10]
root36  0.0  0.0  0 0 ?S14:59   0:00 [migration/11]
root37  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/11]
root38  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/11]
root39  0.0  0.0  0 0 ?S14:59   0:00 [migration/12]
root40  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/12]
root41  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/12]
root42  0.0  0.0  0 0 ?S14:59   0:00 [migration/13]
root43  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/13]
root44  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/13]
root45  0.0  0.0  0 0 ?S14:59   0:00 [migration/14]
root46  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/14]
root47  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/14]
root48  0.0  0.0  0 0 ?S14:59   0:00 [migration/15]
root49  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/15]
root50  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/15]
root51  0.0  0.0  0 0 ?S14:59   0:00 [migration/16]
root52  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/16]
root53  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/16]
root54  0.0  0.0  0 0 ?S14:59   0:00 [migration/17]
root55  0.0  0.0  0 0 ?S14:59   0:00 [ksoftirqd/17]
root56  0.0  0.0  0 0 ?S14:59   0:00 [watchdog/17]
root57  0.0  0.0  0 0 ?S14:59   0:00 [migration/18]
root58  0.0  0.0 

[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
** Attachment added: multipath.conf
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218022/+files/multipath.conf

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
** Attachment added: lvm.conf
   
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+attachment/3218023/+files/lvm.conf

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after FC multipath failover

2012-07-09 Thread Steve Fisher
note: /dev/sda is a raid volume used for the root vg; the filter is
actually r|/dev/sd[b-z]|

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] Re: Cannot read superblock after fibre path failover

2012-07-08 Thread Steve Fisher
** Summary changed:

- Multiple filesystems cannot read superblock after fibre path failover
+ Cannot read superblock after fibre path failover

** Summary changed:

- Cannot read superblock after fibre path failover
+ Cannot read superblock after FC multipath failover

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1020436] Re: Cannot read superblock after fibre path failover

2012-07-08 Thread Steve Fisher
** Summary changed:

- Multiple filesystems cannot read superblock after fibre path failover
+ Cannot read superblock after fibre path failover

** Summary changed:

- Cannot read superblock after fibre path failover
+ Cannot read superblock after FC multipath failover

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1020436] [NEW] Multiple filesystems cannot read superblock after fibre path failover

2012-07-03 Thread Steve Fisher
Public bug reported:

This might not be multipath-tools, but it's the likeliest candidate.

If fibre channel connectivity is disrupted on an active path, I can see
the multipath daemon rejuggling active paths as expected.  However
instead of utilising a new path, the server continues trying active i/o
down the same path until it fails, and the superblock on the lun is
marked unreadable.

The luns simply cannot be remounted even after the paths are recovered.
However, after a simple reboot, the luns are again readable _regardless_
of the visible path.  (Eg I shut the active FC port then rebooted, and
the storage was mountable per normal over a different path).

I've replicated this behaviour using both xfs and ext4 filesystems, on
multiple different luns presented to the server.

The luns are visible over four FC paths, and the /dev/mapper/mpathX
identity is used as the physical volume for an LVM2 volume group, into
which the logical volume(s) and filesystem are created.

root@host:~# lsb_release -rd
Description:Ubuntu 10.04.3 LTS
Release:10.04

As you can see here, the /dev/test and /dev/test2 volume groups are both
active, yet showing i/o errors after a pathing fail.

root@host:~# lvscan
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  ACTIVE'/dev/test2/lvol0' [4.00 GiB] inherit
  ACTIVE'/dev/test/lvol0' [4.00 GiB] inherit
  ACTIVE'/dev/mvsan/san' [500.00 GiB] inherit

root@host:~# pvscan
  /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
  /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
  PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
  PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]

root@host:~# multipath -ll
mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
[size=4.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
 \_ 6:0:0:2 sdd 8:48  [active][ready]
 \_ 6:0:1:2 sdg 8:96  [active][ready]
 \_ 7:0:0:2 sdj 8:144 [active][ready]
 \_ 7:0:1:2 sdp 8:240 [active][ready]
mpath1 (360060e80058a07008a07000e) dm-6 HITACHI ,OPEN-V
[size=4.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
 \_ 6:0:0:1 sdc 8:32  [active][ready]
 \_ 6:0:1:1 sdf 8:80  [active][ready]
 \_ 7:0:0:1 sdi 8:128 [active][ready]
 \_ 7:0:1:1 sdo 8:224 [active][ready]
mpath0 (360060e80058a07008a072417) dm-5 HITACHI ,OPEN-V
[size=500G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
 \_ 6:0:0:0 sdb 8:16  [active][ready]
 \_ 6:0:1:0 sde 8:64  [active][ready]
 \_ 7:0:0:0 sdh 8:112 [active][ready]
 \_ 7:0:1:0 sdk 8:160 [active][ready]

The dmesg output is very instructive, but still doesn't help with the
root cause per se.  I set the scsi-timeouts nice and high (90 seconds)
to accommodate any potential delays in path failover.  The path checks
(test unit ready/TUR) occur about every 20 seconds.

See below:

[ 2192.891962] qla2xxx :24:00.0: LOOP DOWN detected (2 3 0).
[ .820850]  rport-7:0-0: blocked FC remote port time out: removing target 
and saving binding
[ .821297]  rport-7:0-1: blocked FC remote port time out: removing target 
and saving binding
[ .827286] device-mapper: multipath: Failing path 8:112.
[ .827369] device-mapper: multipath: Failing path 8:160.

So this is all as expected.

Then:

[ .924665] sd 7:0:0:0: [sdh] Synchronizing SCSI cache
[ .924696] sd 7:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[ .926071] device-mapper: multipath: Failing path 8:160.
[ .976444] sd 7:0:1:2: [sdm] Unhandled error code
[ .976446] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[ .976449] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 00 00 00 01 00
[ .976455] end_request: I/O error, dev sdm, sector 8380672
[ .980258] sd 7:0:1:2: [sdm] Unhandled error code
[ .980260] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[ .980262] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 70 00 00 01 00
[ .980267] end_request: I/O error, dev sdm, sector 8380784
[ .984139] sd 7:0:1:2: [sdm] 

[Bug 1020436] Re: Multiple filesystems cannot read superblock after fibre path failover

2012-07-03 Thread Steve Fisher
** Description changed:

  This might not be multipath-tools, but it's the likeliest candidate.
  
  If fibre channel connectivity is disrupted on an active path, I can see
  the multipath daemon rejuggling active paths as expected.  However
  instead of utilising a new path, the server continues trying active i/o
  down the same path until it fails, and the superblock on the lun is
  marked unreadable.
  
  The luns simply cannot be remounted even after the paths are recovered.
  However, after a simple reboot, the luns are again readable _regardless_
  of the visible path.  (Eg I shut the active FC port then rebooted, and
  the storage was mountable per normal over a different path).
  
  I've replicated this behaviour using both xfs and ext4 filesystems, on
  multiple different luns presented to the server.
  
  The luns are visible over four FC paths, and the /dev/mapper/mpathX
  identity is used as the physical volume for an LVM2 volume group, into
  which the logical volume(s) and filesystem are created.
  
  root@host:~# lsb_release -rd
  Description:  Ubuntu 10.04.3 LTS
  Release:  10.04
  
  As you can see here, the /dev/test and /dev/test2 volume groups are both
  active, yet showing i/o errors after a pathing fail.
  
  root@host:~# lvscan
-   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   ACTIVE'/dev/test2/lvol0' [4.00 GiB] inherit
-   ACTIVE'/dev/test/lvol0' [4.00 GiB] inherit
-   ACTIVE'/dev/mvsan/san' [500.00 GiB] inherit
+   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   ACTIVE'/dev/test2/lvol0' [4.00 GiB] inherit
+   ACTIVE'/dev/test/lvol0' [4.00 GiB] inherit
+   ACTIVE'/dev/mvsan/san' [500.00 GiB] inherit
  
  root@host:~# pvscan
-   /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
-   /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
-   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
-   PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
-   PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]
+   /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
+   /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
+   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
+   PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
+   PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]
  
  root@host:~# multipath -ll
- mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
+ mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
  [size=4.0G][features=1 queue_if_no_path][hwhandler=0]
  \_ round-robin 0 [prio=4][active]
-  \_ 6:0:0:2 sdd 8:48  [active][ready]
-  \_ 6:0:1:2 sdg 8:96  [active][ready]
-  \_ 7:0:0:2 sdj 8:144 [active][ready]
-  \_ 7:0:1:2 sdp 8:240 [active][ready]
- mpath1 (360060e80058a07008a07000e) dm-6 HITACHI ,OPEN-V
+  \_ 6:0:0:2 sdd 8:48  [active][ready]
+  \_ 6:0:1:2 sdg 8:96  [active][ready]
+  \_ 7:0:0:2 sdj 8:144 [active][ready]
+  \_ 7:0:1:2 sdp 8:240 [active][ready]
+ mpath1 (360060e80058a07008a07000e) dm-6 HITACHI ,OPEN-V
  [size=4.0G][features=1 queue_if_no_path][hwhandler=0]
  \_ round-robin 0 [prio=4][active]
-  \_ 6:0:0:1 sdc 8:32  [active][ready]
-  \_ 6:0:1:1 sdf 8:80  [active][ready]
-  \_ 7:0:0:1 sdi 8:128 [active][ready]
-  \_ 7:0:1:1 sdo 8:224 [active][ready]
- mpath0 

[Bug 1020436] [NEW] Multiple filesystems cannot read superblock after fibre path failover

2012-07-03 Thread Steve Fisher
Public bug reported:

This might not be multipath-tools, but it's the likeliest candidate.

If fibre channel connectivity is disrupted on an active path, I can see
the multipath daemon rejuggling active paths as expected.  However
instead of utilising a new path, the server continues trying active i/o
down the same path until it fails, and the superblock on the lun is
marked unreadable.

The luns simply cannot be remounted even after the paths are recovered.
However, after a simple reboot, the luns are again readable _regardless_
of the visible path.  (Eg I shut the active FC port then rebooted, and
the storage was mountable per normal over a different path).

I've replicated this behaviour using both xfs and ext4 filesystems, on
multiple different luns presented to the server.

The luns are visible over four FC paths, and the /dev/mapper/mpathX
identity is used as the physical volume for an LVM2 volume group, into
which the logical volume(s) and filesystem are created.

root@host:~# lsb_release -rd
Description:Ubuntu 10.04.3 LTS
Release:10.04

As you can see here, the /dev/test and /dev/test2 volume groups are both
active, yet showing i/o errors after a pathing fail.

root@host:~# lvscan
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  ACTIVE'/dev/test2/lvol0' [4.00 GiB] inherit
  ACTIVE'/dev/test/lvol0' [4.00 GiB] inherit
  ACTIVE'/dev/mvsan/san' [500.00 GiB] inherit

root@host:~# pvscan
  /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
  /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
  /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
  PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
  PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
  PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]

root@host:~# multipath -ll
mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
[size=4.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
 \_ 6:0:0:2 sdd 8:48  [active][ready]
 \_ 6:0:1:2 sdg 8:96  [active][ready]
 \_ 7:0:0:2 sdj 8:144 [active][ready]
 \_ 7:0:1:2 sdp 8:240 [active][ready]
mpath1 (360060e80058a07008a07000e) dm-6 HITACHI ,OPEN-V
[size=4.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
 \_ 6:0:0:1 sdc 8:32  [active][ready]
 \_ 6:0:1:1 sdf 8:80  [active][ready]
 \_ 7:0:0:1 sdi 8:128 [active][ready]
 \_ 7:0:1:1 sdo 8:224 [active][ready]
mpath0 (360060e80058a07008a072417) dm-5 HITACHI ,OPEN-V
[size=500G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=4][active]
 \_ 6:0:0:0 sdb 8:16  [active][ready]
 \_ 6:0:1:0 sde 8:64  [active][ready]
 \_ 7:0:0:0 sdh 8:112 [active][ready]
 \_ 7:0:1:0 sdk 8:160 [active][ready]

The dmesg output is very instructive, but still doesn't help with the
root cause per se.  I set the scsi-timeouts nice and high (90 seconds)
to accommodate any potential delays in path failover.  The path checks
(test unit ready/TUR) occur about every 20 seconds.

See below:

[ 2192.891962] qla2xxx :24:00.0: LOOP DOWN detected (2 3 0).
[ .820850]  rport-7:0-0: blocked FC remote port time out: removing target 
and saving binding
[ .821297]  rport-7:0-1: blocked FC remote port time out: removing target 
and saving binding
[ .827286] device-mapper: multipath: Failing path 8:112.
[ .827369] device-mapper: multipath: Failing path 8:160.

So this is all as expected.

Then:

[ .924665] sd 7:0:0:0: [sdh] Synchronizing SCSI cache
[ .924696] sd 7:0:0:0: [sdh] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[ .926071] device-mapper: multipath: Failing path 8:160.
[ .976444] sd 7:0:1:2: [sdm] Unhandled error code
[ .976446] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[ .976449] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 00 00 00 01 00
[ .976455] end_request: I/O error, dev sdm, sector 8380672
[ .980258] sd 7:0:1:2: [sdm] Unhandled error code
[ .980260] sd 7:0:1:2: [sdm] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK
[ .980262] sd 7:0:1:2: [sdm] CDB: Read(10): 28 00 00 7f e1 70 00 00 01 00
[ .980267] end_request: I/O error, dev sdm, sector 8380784
[ .984139] sd 7:0:1:2: [sdm] 

[Bug 1020436] Re: Multiple filesystems cannot read superblock after fibre path failover

2012-07-03 Thread Steve Fisher
** Description changed:

  This might not be multipath-tools, but it's the likeliest candidate.
  
  If fibre channel connectivity is disrupted on an active path, I can see
  the multipath daemon rejuggling active paths as expected.  However
  instead of utilising a new path, the server continues trying active i/o
  down the same path until it fails, and the superblock on the lun is
  marked unreadable.
  
  The luns simply cannot be remounted even after the paths are recovered.
  However, after a simple reboot, the luns are again readable _regardless_
  of the visible path.  (Eg I shut the active FC port then rebooted, and
  the storage was mountable per normal over a different path).
  
  I've replicated this behaviour using both xfs and ext4 filesystems, on
  multiple different luns presented to the server.
  
  The luns are visible over four FC paths, and the /dev/mapper/mpathX
  identity is used as the physical volume for an LVM2 volume group, into
  which the logical volume(s) and filesystem are created.
  
  root@host:~# lsb_release -rd
  Description:  Ubuntu 10.04.3 LTS
  Release:  10.04
  
  As you can see here, the /dev/test and /dev/test2 volume groups are both
  active, yet showing i/o errors after a pathing fail.
  
  root@host:~# lvscan
-   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   ACTIVE'/dev/test2/lvol0' [4.00 GiB] inherit
-   ACTIVE'/dev/test/lvol0' [4.00 GiB] inherit
-   ACTIVE'/dev/mvsan/san' [500.00 GiB] inherit
+   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   ACTIVE'/dev/test2/lvol0' [4.00 GiB] inherit
+   ACTIVE'/dev/test/lvol0' [4.00 GiB] inherit
+   ACTIVE'/dev/mvsan/san' [500.00 GiB] inherit
  
  root@host:~# pvscan
-   /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
-   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
-   /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
-   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
-   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
-   PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
-   PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
-   PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]
+   /dev/test2/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 4096: Input/output error
+   /dev/test2/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 4290707456: Input/output 
error
+   /dev/test/lvol0: read failed after 0 of 4096 at 4290764800: Input/output 
error
+   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 4096: Input/output error
+   /dev/test/lvol0: read failed after 0 of 4096 at 0: Input/output error
+   PV /dev/mapper/mpath2   VG test2lvm2 [4.00 GiB / 0free]
+   PV /dev/mapper/mpath1   VG test lvm2 [4.00 GiB / 0free]
+   PV /dev/mapper/mpath0   VG mvsanlvm2 [500.00 GiB / 0free]
  
  root@host:~# multipath -ll
- mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
+ mpath2 (360060e80058a07008a070023) dm-7 HITACHI ,OPEN-V
  [size=4.0G][features=1 queue_if_no_path][hwhandler=0]
  \_ round-robin 0 [prio=4][active]
-  \_ 6:0:0:2 sdd 8:48  [active][ready]
-  \_ 6:0:1:2 sdg 8:96  [active][ready]
-  \_ 7:0:0:2 sdj 8:144 [active][ready]
-  \_ 7:0:1:2 sdp 8:240 [active][ready]
- mpath1 (360060e80058a07008a07000e) dm-6 HITACHI ,OPEN-V
+  \_ 6:0:0:2 sdd 8:48  [active][ready]
+  \_ 6:0:1:2 sdg 8:96  [active][ready]
+  \_ 7:0:0:2 sdj 8:144 [active][ready]
+  \_ 7:0:1:2 sdp 8:240 [active][ready]
+ mpath1 (360060e80058a07008a07000e) dm-6 HITACHI ,OPEN-V
  [size=4.0G][features=1 queue_if_no_path][hwhandler=0]
  \_ round-robin 0 [prio=4][active]
-  \_ 6:0:0:1 sdc 8:32  [active][ready]
-  \_ 6:0:1:1 sdf 8:80  [active][ready]
-  \_ 7:0:0:1 sdi 8:128 [active][ready]
-  \_ 7:0:1:1 sdo 8:224 [active][ready]
- mpath0