Alrighty.

[OK] lvm.conf device blacklist
[OK] multipath.conf, sans fix I already pointed out
You can ignore that FW blurb, it's just a script that shouldn't be so noisy.

failover logs *look good*

bug-1020436$ grep fail_path multipathd_stdout.log | sort -u
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  
NF  fail_path 8:112
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  
NF  fail_path 8:160
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  
NF  fail_path 8:128
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  
NF  fail_path 8:176
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  
NF  fail_path 8:144
Jul 10 15:08:25 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  
NF  fail_path 8:192

# 3 mins go by... oh and several devices are failed "twice"

# Jul 10 15:10:19 devices start to return

Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  
NF  fail_path 8:16
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath0  
NF  fail_path 8:64
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  
NF  fail_path 8:32
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath1  
NF  fail_path 8:80
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  
NF  fail_path 8:48
Jul 10 15:11:50 | libdevmapper: ioctl/libdm-iface.c(1740): dm message mpath2  
NF  fail_path 8:96

and according to udev, note that I re-sequenced the grep and grouped by
mpathX.

bug-1020436$ grep -A4 PATH_FAILED multipathd_stdout.log

--
Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=1
Jul 10 15:08:25 | DM_PATH=8:112
Jul 10 15:08:25 | DM_NR_VALID_PATHS=3
Jul 10 15:08:25 | DM_NAME=mpath0
--

Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=2
Jul 10 15:08:25 | DM_PATH=8:160
Jul 10 15:08:25 | DM_NR_VALID_PATHS=2
Jul 10 15:08:25 | DM_NAME=mpath0
--

Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=3
Jul 10 15:08:25 | DM_PATH=8:160
Jul 10 15:08:25 | DM_NR_VALID_PATHS=2
Jul 10 15:08:25 | DM_NAME=mpath0
--


Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=1
Jul 10 15:08:25 | DM_PATH=8:128
Jul 10 15:08:25 | DM_NR_VALID_PATHS=3
Jul 10 15:08:25 | DM_NAME=mpath1
--

Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=2
Jul 10 15:08:25 | DM_PATH=8:176
Jul 10 15:08:25 | DM_NR_VALID_PATHS=2
Jul 10 15:08:25 | DM_NAME=mpath1
--

Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=3
Jul 10 15:08:25 | DM_PATH=8:176
Jul 10 15:08:25 | DM_NR_VALID_PATHS=2
Jul 10 15:08:25 | DM_NAME=mpath1


Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=1
Jul 10 15:08:25 | DM_PATH=8:144
Jul 10 15:08:25 | DM_NR_VALID_PATHS=3
Jul 10 15:08:25 | DM_NAME=mpath2
--

Jul 10 15:08:25 | DM_ACTION=PATH_FAILED
Jul 10 15:08:25 | DM_SEQNUM=2
Jul 10 15:08:25 | DM_PATH=8:192
Jul 10 15:08:25 | DM_NR_VALID_PATHS=2
Jul 10 15:08:25 | DM_NAME=mpath2

This all looks OK, you can see the paths count down and stop at the right count.
The way multibus works is a lot like network bonding link aggregation, io is
simply steered down the remaining paths. What might have happened is the
LVs backed by this mpath device somehow got deactivated and that's why the
filesystem can't write back.

Try this for me please. Shutdown those paths and keep them down, then
after about a minute, run.
# lvs -o lv_attr
 
We should see "w" and "a" for every volume, if not, that's the problem.

Also, while in the "failed state", see if you can do a dd read from one
of the mpaths? like dd if=/dev/mapper/mpath0 of=/dev/null count=10.
If that works, try the same thing with a random sd device.

I don't see anything amiss in the strace. The SG ioctls look OK and
the ioctls made to DM make sense.

We want to determine at which level that sending block ios is hung up.

** Changed in: multipath-tools (Ubuntu)
       Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to multipath-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1020436

Title:
  Cannot read superblock after FC multipath failover

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1020436/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to