[Bug 1032550] Re: [multipath] failed to get sysfs information
Syslog of the last testrun on kernel 3.5.0. Test started @ 10:15 ** Attachment added: "syslog.server1" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+attachment/3473174/+files/syslog.server1 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
Syslog for server2, the same testrun also on kernel 3.5.0 ** Attachment added: "syslog.server2" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+attachment/3473175/+files/syslog.server2 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
Peter, First: happy new year! I've been doing some more tests to track down the cause of this bug. Since it looks like a kernel bug, I tried reproducing this with kernel 3.5.0, version 3.5.0-21.32~precise1. I could reproduce the faulty paths that multipathd was unable to remove, however: there were no hanging processes this time and thus no kernel crash.. which is an improvement. During the test I did see this happening: LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=130 status=active | |- 4:0:1:1 sdi 8:128 active ready running | `- #:#:#:# - #:# active faulty running `-+- policy='round-robin 0' prio=10 status=enabled |- 4:0:0:1 sdg 8:96 active ready running `- #:#:#:# - #:# active faulty running LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 , size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=130 status=active | |- #:#:#:# - #:# failed faulty running | `- 4:0:0:0 sde 8:64 active ready running `-+- policy='round-robin 0' prio=10 status=enabled |- #:#:#:# - #:# active faulty running `- 4:0:1:0 sdh 8:112 active ready running As you can see, multipathd fails to remove the 'faulty' paths from the device-mapping again. However, for some reason this didn't lead to processes stuck in 'D' state this time. During this, the following message was logged repeatedly: Jan 3 10:24:14 ealxs00161 multipathd: sdd: failed to get sysfs information Jan 3 10:24:14 ealxs00161 multipathd: sdd: unusable path So multipathd was retrying the removal, but it failed every time. After bringing the path back up, it restored OK and everything was fine again: LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=130 status=active | |- 4:0:1:1 sdi 8:128 active ready running | `- 3:0:0:1 sdc 8:32 active ready running `-+- policy='round-robin 0' prio=10 status=enabled |- 4:0:0:1 sdg 8:96 active ready running `- 3:0:1:1 sdf 8:80 active ready running LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 DGC,VRAID size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=130 status=active | |- 3:0:1:0 sdd 8:48 active ready running | `- 4:0:0:0 sde 8:64 active ready running `-+- policy='round-robin 0' prio=10 status=enabled |- 3:0:0:0 sdb 8:16 active ready running `- 4:0:1:0 sdh 8:112 active ready running After this, failing over again worked just fine, the paths that failed to be removed the last time were now removed without problems... Both machines survived about 10 up/down testruns. I'll attach the syslog of this run shortly. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
The normal state of the disks is as follows: LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-2 DGC,VRAID size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=130 status=active | |- 4:0:0:1 sdg 8:96 active ready running | `- 3:0:0:1 sdc 8:32 active ready running `-+- policy='round-robin 0' prio=10 status=enabled |- 4:0:1:1 sdi 8:128 active ready running `- 3:0:1:1 sde 8:64 active ready running LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-1 DGC,VRAID size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=130 status=active | |- 4:0:1:0 sdh 8:112 active ready running | `- 3:0:1:0 sdd 8:48 active ready running `-+- policy='round-robin 0' prio=10 status=enabled |- 4:0:0:0 sdf 8:80 active ready running `- 3:0:0:0 sdb 8:16 active ready running So one LUN of 20G, another of 200G, both having 4 path's to the SAN. Today's test started at Dec 21 10:53:44, with this: ealxs00161 kernel: [62445.130300] qla2xxx [:07:00.0]-500b:3: LOOP DOWN detected (2 3 0 0). It took several up/down sequences to reproduce the problem. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
vmcore file of crashed kernel. ** Attachment added: "vmcore-crash.tgz" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+attachment/3465398/+files/vmcore-crash.tgz -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
I've uploaded the vmcore file separately, because I have some doubts about the dumpfile created by apport beging complete. Please find it here: http://www.rmoesbergen.nl/vmcore-crash.tgz -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
Peter, I got it to crash again, this time with a nice kernel dump. The dump can be fetched here: http://www.rmoesbergen.nl/linux-image-3.2.0-34-generic.0.crash.gz The crash itself looked like this: Dec 21 11:07:32 ealxs00161 kernel: [63272.392812] sd 4:0:1:1: emc: ALUA failover mode detected Dec 21 11:07:32 ealxs00161 kernel: [63272.392820] sd 4:0:1:1: emc: at SP B Port 1 (owned, default SP B) Dec 21 11:07:32 ealxs00161 kernel: [63272.393180] sd 3:0:0:1: emc: ALUA failover mode detected Dec 21 11:07:32 ealxs00161 kernel: [63272.393187] sd 3:0:0:1: emc: at SP B Port 0 (owned, default SP B) Dec 21 11:10:36 ealxs00161 kernel: [63455.641431] qla2xxx [:07:00.0]-500b:3: LOOP DOWN detected (2 3 0 0). Dec 21 11:10:52 ealxs00161 multipathd: sdf: remove path (uevent) Dec 21 11:10:52 ealxs00161 kernel: [63471.548255] rport-3:0-1: blocked FC remote port time out: removing target and saving binding Dec 21 11:10:52 ealxs00161 kernel: [63471.676065] rport-3:0-0: blocked FC remote port time out: removing target and saving binding Dec 21 11:11:08 ealxs00161 cimserver[2079]: Authentication failed for user=root. Dec 21 11:11:10 ealxs00161 cimserver[2079]: Authentication failed for user=root. Dec 21 11:13:28 ealxs00161 kernel: [63627.745648] INFO: task jbd2/dm-1-8:1530 blocked for more than 120 seconds. Dec 21 11:13:28 ealxs00161 kernel: [63627.746025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 21 11:13:28 ealxs00161 kernel: [63627.756371] jbd2/dm-1-8 D 8803aa11a620 0 1530 2 0x Dec 21 11:13:28 ealxs00161 kernel: [63627.756380] 880416141ac0 0046 880416141a60 88042ee137c0 Dec 21 11:13:28 ealxs00161 kernel: [63627.756388] 880416141fd8 880416141fd8 880416141fd8 000137c0 Dec 21 11:13:28 ealxs00161 kernel: [63627.756395] 81c0d020 880415ef9700 880416141a90 88042ee14080 Dec 21 11:13:28 ealxs00161 kernel: [63627.756403] Call Trace: Dec 21 11:13:28 ealxs00161 kernel: [63627.756416] [] ? __lock_page+0x70/0x70 Dec 21 11:13:28 ealxs00161 kernel: [63627.756431] [] schedule+0x3f/0x60 Dec 21 11:13:28 ealxs00161 kernel: [63627.756441] [] io_schedule+0x8f/0xd0 Dec 21 11:13:28 ealxs00161 kernel: [63627.756451] [] sleep_on_page+0xe/0x20 Dec 21 11:13:28 ealxs00161 kernel: [63627.756460] [] __wait_on_bit+0x5f/0x90 Dec 21 11:13:28 ealxs00161 kernel: [63627.756470] [] wait_on_page_bit+0x78/0x80 Dec 21 11:13:28 ealxs00161 kernel: [63627.756481] [] ? autoremove_wake_function+0x40/0x40 Dec 21 11:13:28 ealxs00161 kernel: [63627.756492] [] filemap_fdatawait_range+0x10c/0x1a0 Dec 21 11:13:28 ealxs00161 kernel: [63627.756503] [] filemap_fdatawait+0x2b/0x30 Dec 21 11:13:28 ealxs00161 kernel: [63627.756516] [] journal_finish_inode_data_buffers+0x70/0x170 Dec 21 11:13:28 ealxs00161 kernel: [63627.756528] [] jbd2_journal_commit_transaction+0x665/0x1240 Dec 21 11:13:28 ealxs00161 kernel: [63627.756538] [] ? add_wait_queue+0x60/0x60 Dec 21 11:13:28 ealxs00161 kernel: [63627.756548] [] kjournald2+0xbb/0x220 Dec 21 11:13:28 ealxs00161 kernel: [63627.756557] [] ? add_wait_queue+0x60/0x60 Dec 21 11:13:28 ealxs00161 kernel: [63627.756566] [] ? commit_timeout+0x10/0x10 Dec 21 11:13:28 ealxs00161 kernel: [63627.756575] [] kthread+0x8c/0xa0 Dec 21 11:13:28 ealxs00161 kernel: [63627.756587] [] kernel_thread_helper+0x4/0x10 Dec 21 11:13:28 ealxs00161 kernel: [63627.756596] [] ? flush_kthread_worker+0xa0/0xa0 Dec 21 11:13:28 ealxs00161 kernel: [63627.756606] [] ? gs_change+0x13/0x13 Dec 21 11:13:28 ealxs00161 kernel: [63627.756612] Kernel panic - not syncing: hung_task: blocked tasks Dec 21 11:13:28 ealxs00161 kernel: [63627.768425] Pid: 66, comm: khungtaskd Tainted: GW3.2.0-34-generic #53-Ubuntu Dec 21 11:13:28 ealxs00161 kernel: [63627.779691] Call Trace: Dec 21 11:13:28 ealxs00161 kernel: [63627.790147] [] panic+0x91/0x1a4 Dec 21 11:13:28 ealxs00161 kernel: [63627.800888] [] check_hung_task+0xb2/0xc0 Dec 21 11:13:28 ealxs00161 kernel: [63627.811370] [] check_hung_uninterruptible_tasks+0x11b/0x140 Dec 21 11:13:28 ealxs00161 kernel: [63627.821998] [] ? check_hung_uninterruptible_tasks+0x140/0x140 Dec 21 11:13:28 ealxs00161 kernel: [63627.833715] [] watchdog+0x4f/0x60 Dec 21 11:13:28 ealxs00161 kernel: [63627.844538] [] kthread+0x8c/0xa0 Dec 21 11:13:28 ealxs00161 kernel: [63627.855370] [] kernel_thread_helper+0x4/0x10 Dec 21 11:13:28 ealxs00161 kernel: [63627.866367] [] ? flush_kthread_worker+0xa0/0xa0 Dec 21 11:13:28 ealxs00161 kernel: [63627.877343] [] ? gs_change+0x13/0x13 output of ps xa, just before the crash: PID TTY STAT TIME COMMAND 1 ?Ss 0:02 /sbin/init 2 ?S 0:00 [kthreadd] 3 ?S 0:01 [ksoftirqd/0] 6 ?S 0:01 [migration/0] 7 ?S 0:00 [watchdog/0] 8 ?S 0:00 [migration/1] 10 ?S 0:00 [ksoftirqd/1] 12 ?S 0:00 [w
[Bug 1032550] Re: [multipath] failed to get sysfs information
Removing the 'reset_devices' and 'irq_poll' in /etc/init.d/kdump did the trick, both machines now generate a nice vmcore file when they crash. I'll re-test with all your settings applied and report back. Thanks. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
Peter, is there another way to get you a kernel crash-dump? Whatever I try, the crashdump tools don't produce a crashdump. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
The problem is not making the kernel crash, that's easy :) The issue is that no crash dump is ever generated. The whole kexec dump stuff just doesn't seem to work. I already went through all the troubleshooting tips and tricks, but it the kernel just crashed and hangs forever and only a reset works. Causing a crash with echo c > /proc/sysrq-trigger: same effect. Crash, hang, nothing... -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
I've been trying to get the crashdump stuff to work, but it just won't generate a dump, or do anything at all when the kernel crashes... Is there another way to get the info you want? Would a console screenshot with a high res console work? About the hanging processes: The only processes I saw in 'D' state were these 2 processes: /sbin/multipath -v0 /dev/sdl' [14785] '/sbin/multipath -v0 /dev/sdj' [14739] And ofcourse anything accessing the disks on top of the multipath devices. When I do another run, I'll make sure to create a dump of the process list as well. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
** Attachment added: "syslog.test2.survived.gz" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+attachment/3457383/+files/syslog.test2.survived.gz -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
I just retested with the multipath-tools version '0.4.9-3ubuntu5+lp1032250dbg1', on 2 systems simulatiously. The test consisted of repeatedly shutting down the fibre port in the switch for one of the paths. One system survived (see syslog.test2.survived.gz), one system did not (syslog.test2.broken.gz). The crashdump stuff was installled and panic_on_oops set to '1', but there were no kernel 'BUG' or 'OOPS'-en going on this time... The 'failed to get sysfs information' error is gone, however. It might have been replaced with the following though: Dec 11 16:04:44 ealxs00162 udevd[8828]: rename '/dev/disk/by-id/wwn- 0x6006016061e02e008a2d4fa5b307e211.udev-tmp' '/dev/disk/by-id/wwn- 0x6006016061e02e008a2d4fa5b307e211' failed: No such file or directory Hope this helps. ** Attachment added: "syslog.test2.broken.gz" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+attachment/3457382/+files/syslog.test2.broken.gz -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
@Peter: I'll try do reproduce on our acceptance systems with crashdump installed and report back. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
I'm seeing the same issue with an EMC clariion, see attached syslog. On my systems this problem leads to hanging processes and a server crash that can only be fixed with a reboot .. ** Attachment added: "syslog of multipath failure" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+attachment/3456041/+files/syslog_multipath_fail.gz -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1032550] Re: [multipath] failed to get sysfs information
** Attachment added: "My multipath.conf" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+attachment/3456042/+files/multipath.conf -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1032550 Title: [multipath] failed to get sysfs information To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs