Hi Lucas,
FYI, this is affecting a customer deployment and I am working with
Jarred on this issue. We have tried a few variations of the multipath
configuration. The current config is the following:
defaults {
user_friendly_names yes
find_multipaths yes
polling_interval 10
}
devices {
device {
vendor "PURE"
product "FlashArray"
path_selector "queue-length 0"
path_grouping_policy "group_by_prio"
rr_min_io 1
path_checker tur
fast_io_fail_tmo 1
dev_loss_tmo infinity
no_path_retry 5
failback immediate
prio alua
hardware_handler "1 alua"
max_sectors_kb 4096
}
}
multipaths {
multipath {
wwid "<WWID>"
alias data1
}
}
This configuration is used successfully in other systems (Ubuntu 18.04,
Debian, Centos) in the customer environment and is the recommended
configuration from their storage team.
For context, the ubuntu nodes that are affected are part of an Openstack
deployment. The multipath configuration is in place to set a
FibreChannel connection for the device used as Ceph-OSD. The failover
was tested in 2 different ways, which both led to nodes being
unresponsive. First, we turned off one of the F5 switch to simulate a
power failure. The other way we've tested it is by resetting the I/O
module in UCS manager for the chassis where the nodes are located.
As you can see, there's very little information in the logs about what
is actually causing the node to go wild. Do you have guidance on how to
gather more detailed logs? Jarred and myself can provide live access to
the environment if needed.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1944005
Title:
System Lockup during failover
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1944005/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs