[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
sigh, well that illustrates why this design was a bad idea. I changed the param to use their short form which should keep the service filename length under the limit. To be totally clear, this is in no way a correct fix, I just want to confirm this is actually the problem; then it'll have to get fixed upstream in a 'correct' way. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
Dan, I've attached /var/log/syslog and journalctl logs of a recreate after installing nvme-cli_1.9-1ubuntu0.1+bug1874270v20210408b2_amd64 and rebooting the host. Apr 8 14:44:00 ICTM1608S01H1 root: JD: Resetting controller B Apr 8 14:44:09 ICTM1608S01H1 kernel: [ 196.190003] lpfc :af:00.1: 5:(0):6172 NVME rescanned DID x3d3800 port_state x2 Apr 8 14:44:09 ICTM1608S01H1 kernel: [ 196.190082] nvme nvme2: NVME-FC{2}: controller connectivity lost. Awaiting Reconnect Apr 8 14:44:09 ICTM1608S01H1 kernel: [ 196.190176] lpfc :18:00.1: 1:(0):6172 NVME rescanned DID x3d3800 port_state x2 Apr 8 14:44:09 ICTM1608S01H1 kernel: [ 196.190268] nvme nvme6: NVME-FC{6}: controller connectivity lost. Awaiting Reconnect Apr 8 14:44:09 ICTM1608S01H1 kernel: [ 196.211805] nvme nvme2: NVME-FC{2}: io failed due to lldd error 6 Apr 8 14:44:09 ICTM1608S01H1 systemd[1]: Started NVMf auto-connect scan upon nvme discovery controller Events. Apr 8 14:44:09 ICTM1608S01H1 systemd[1]: nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service: Succeeded. Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2827]: filp(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2828]: dentry(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2827]: pid(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2828]: inode_cache(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2829]: kmalloc-rcl-512(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2829]: PING(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2828]: skbuff_head_cache(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2827]: kmalloc-1k(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2830]: sock_inode_cache(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09 ICTM1608S01H1 systemd-udevd[2829]: kmalloc-64(639:nvmf-connect@\x2d\x2ddevice\x3dnone\x20\x2d\x2dtransport\x3dfc\x20\x2d\x2dtraddr\x3dnn\x2d0x200200a098d8580e:pn\x2d0x202300a098d8580e\x20\x2d\x2dtrsvcid\x3dnone\x20\x2d\x2dhost\x2dtraddr\x3dnn\x2d0x2090fadcc5ce:pn\x2d0x1090fadcc5ce.service): Failed to process device, ignoring: File name too long Apr 8 14:44:09
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
> It looks like connect-all didn't recognize the "matching" flag ah sorry, that param is used in later versions, i uploaded a new buidl to the ppa can you try that one? It just started building now, so might take some time to finish building and get published. ** Changed in: nvme-cli (Ubuntu) Status: Incomplete => Triaged -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
Dan, I've attached /var/log/syslog and journalctl logs of a recreate after installing nvme-cli_1.9-1ubuntu0.1+bug1874270v20210408b1_amd64 and rebooting the host. It looks like connect-all didn't recognize the "matching" flag. Apr 8 11:48:45 ICTM1608S01H1 root: JD: Resetting controller B Apr 8 11:49:39 ICTM1608S01H1 kernel: [ 545.652088] lpfc :af:00.1: 5:(0):6172 NVME rescanned DID x3d3800 port_state x2 Apr 8 11:49:39 ICTM1608S01H1 kernel: [ 545.652166] nvme nvme2: NVME-FC{2}: controller connectivity lost. Awaiting Reconnect Apr 8 11:49:39 ICTM1608S01H1 kernel: [ 545.652203] lpfc :18:00.1: 1:(0):6172 NVME rescanned DID x3d3800 port_state x2 Apr 8 11:49:39 ICTM1608S01H1 kernel: [ 545.652276] nvme nvme6: NVME-FC{6}: controller connectivity lost. Awaiting Reconnect Apr 8 11:49:39 ICTM1608S01H1 kernel: [ 545.673853] nvme nvme2: NVME-FC{2}: io failed due to lldd error 6 Apr 8 11:49:39 ICTM1608S01H1 systemd[1]: Started NVMf auto-connect scan upon nvme discovery controller Events. Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: connect-all: unrecognized option '--matching' Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: Discover NVMeoF subsystems and connect to them [ --transport=, -t ]--- transport type Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --traddr=, -a ] --- transport address Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --trsvcid=, -s ] --- transport service id (e.g. IP Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: port) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --host-traddr=, -w ] --- host traddr (e.g. FC WWN's) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --hostnqn=, -q ] --- user-defined hostnqn (if default Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: not used) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --hostid=, -I ] --- user-defined hostid (if default Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: not used) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --raw=, -r ] --- raw output file Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --device=, -d ] --- use existing discovery controller Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: device Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --keep-alive-tmo=, -k ] --- keep alive timeout period in Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: seconds Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --reconnect-delay=, -c ] --- reconnect timeout period in Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: seconds Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --ctrl-loss-tmo=, -l ] --- controller loss timeout period in Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: seconds Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --hdr_digest, -g ] --- enable transport protocol header Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: digest (TCP transport) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --data_digest, -G ] --- enable transport protocol data Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: digest (TCP transport) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --nr-io-queues=, -i ] --- number of io queues to use Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: (default is core count) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --nr-write-queues=, -W ] --- number of write queues to use Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: (default 0) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --nr-poll-queues=, -P ] --- number of poll queues to use Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: (default 0) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --queue-size=, -Q ] --- number of io queue elements to Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: use (default 128) Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --persistent, -p ] --- persistent discovery connection Apr 8 11:49:39 ICTM1608S01H1 nvme[7329]: [ --quiet, -Q ] --- suppress already connected errors ** Attachment added: "nvme-cli-1.9-1ubuntu0.1+bug1874270v20210408b1-logs.zip" https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+attachment/5485673/+files/nvme-cli-1.9-1ubuntu0.1+bug1874270v20210408b1-logs.zip -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to:
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
@jduong can you test with the nvme-cli package from this ppa: https://launchpad.net/~ddstreet/+archive/ubuntu/lp1874270 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
Hmm, I looked at the udev rule and the service it calls a bit closer, and I'm not sure if that udev rule has *ever* worked; it's attempting to pass a full list of parameters as the template value on the cmdline, and I can kind of understand the intention, but it's not a good way to implement it. $ sudo systemctl --no-block start nvmf-connect@--device=none\t--transport=fc\t--traddr=nn-0x200400a098d85eb4:pn-0x203400a098d85eb4\t--trsvcid=none\t--host-traddr=nn-0x2090fadcc57dpn-0x1090fadcc57d.service Invalid unit name "nvmf-connect@--device=nonet--transport=fct--traddr=nn-0x200400a098d85eb4:pn-0x203400a098d85eb4t--trsvcid=nonet--host-traddr=nn-0x2090fadcc57dpn-0x1090fadcc57d.service" escaped as "nvmf-connect@--device\x3dnonet--transport\x3dfct--traddr\x3dnn-0x200400a098d85eb4:pn-0x203400a098d85eb4t--trsvcid\x3dnonet--host-traddr\x3dnn-0x2090fadcc57dpn-0x1090fadcc57d.service" (maybe you should use systemd-escape?). The problem here is that systemctl doesn't allow "=" character to be included in the unit template data. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
Dan, where do I change the kernel rport timeout? And how can I go about changing the timeout on a server with Emulex cards installed versus Qlogic? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
It looks like the service is failing because your controller is in the process of resetting, which appears to take several minutes. I'm not sure what the design is for nvme-cli tools handling such a long reset time, but my first guess would be to increase the kernel rport timeout, which appears to be around 30 seconds, from the log output. In your hardware's case, it seems like that timeout should be more than 180 seconds. Apr 07 11:45:10 ICTM1608S01H1 root[2894793]: JD: Resetting controller A Apr 07 11:45:28 ICTM1608S01H1 kernel: lpfc :af:00.1: 5:(0):6172 NVME rescanned DID x3d0a00 port_state x2 Apr 07 11:45:28 ICTM1608S01H1 kernel: lpfc :18:00.1: 1:(0):6172 NVME rescanned DID x3d0a00 port_state x2 Apr 07 11:45:28 ICTM1608S01H1 kernel: nvme nvme5: NVME-FC{4}: controller connectivity lost. Awaiting Reconnect Apr 07 11:45:28 ICTM1608S01H1 kernel: nvme nvme1: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect Apr 07 11:45:28 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:45:28 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:45:28 ICTM1608S01H1 kernel: nvme nvme5: NVME-FC{4}: io failed due to lldd error 6 Apr 07 11:45:28 ICTM1608S01H1 kernel: nvme nvme1: NVME-FC{0}: io failed due to lldd error 6 Apr 07 11:45:29 ICTM1608S01H1 kernel: lpfc :af:00.0: 4:(0):6172 NVME rescanned DID x011400 port_state x2 Apr 07 11:45:29 ICTM1608S01H1 kernel: lpfc :18:00.0: 0:(0):6172 NVME rescanned DID x011400 port_state x2 Apr 07 11:45:29 ICTM1608S01H1 kernel: nvme nvme4: NVME-FC{1}: controller connectivity lost. Awaiting Reconnect Apr 07 11:45:29 ICTM1608S01H1 kernel: nvme nvme8: NVME-FC{5}: controller connectivity lost. Awaiting Reconnect Apr 07 11:45:29 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:45:29 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:45:29 ICTM1608S01H1 kernel: nvme nvme4: NVME-FC{1}: io failed due to lldd error 6 Apr 07 11:45:29 ICTM1608S01H1 kernel: nvme nvme8: NVME-FC{5}: io failed due to lldd error 6 Apr 07 11:45:59 ICTM1608S01H1 kernel: rport-10:0-9: blocked FC remote port time out: removing rport Apr 07 11:45:59 ICTM1608S01H1 kernel: rport-16:0-9: blocked FC remote port time out: removing rport Apr 07 11:45:59 ICTM1608S01H1 kernel: rport-15:0-9: blocked FC remote port time out: removing rport Apr 07 11:45:59 ICTM1608S01H1 kernel: rport-12:0-9: blocked FC remote port time out: removing rport Apr 07 11:46:28 ICTM1608S01H1 kernel: nvme nvme5: NVME-FC{4}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 07 11:46:28 ICTM1608S01H1 kernel: nvme nvme5: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 07 11:46:28 ICTM1608S01H1 kernel: nvme nvme1: NVME-FC{0}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 07 11:46:28 ICTM1608S01H1 kernel: nvme nvme1: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 07 11:46:29 ICTM1608S01H1 kernel: nvme nvme4: NVME-FC{1}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 07 11:46:29 ICTM1608S01H1 kernel: nvme nvme4: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 07 11:46:29 ICTM1608S01H1 kernel: nvme nvme8: NVME-FC{5}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 07 11:46:29 ICTM1608S01H1 kernel: nvme nvme8: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 07 11:47:07 ICTM1608S01H1 systemd-udevd[2896874]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:47:07 ICTM1608S01H1 systemd-udevd[2896874]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:47:08 ICTM1608S01H1 systemd-udevd[2896872]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:47:08 ICTM1608S01H1 systemd-udevd[2896874]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transp> Apr 07 11:49:56 ICTM1608S01H1 root[2899783]: JD: Controller A online Apr 07 11:50:04 ICTM1608S01H1 root[2899924]: nvme-subsys0 - NQN=nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2 Apr 07 11:50:04 ICTM1608S01H1 root[2899924]: \ Apr 07 11:50:04 ICTM1608S01H1 root[2899924]: +- nvme2 fc traddr=nn-0x200200a098d8580e:pn-0x202300a098d8580e host_traddr=nn-0x2090fadcc5ce> Apr 07 11:50:04 ICTM1608S01H1 root[2899924]: +- nvme3 fc traddr=nn-0x200200a098d8580e:pn-0x201300a098d8580e host_traddr=nn-0x20109b8f2b8d> Apr 07 11:50:04 ICTM1608S01H1 root[2899924]: +- nvme6 fc
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
At the time in which a storage controller is failed, /var/log/syslog and journalctl look identical: Apr 7 11:45:28 ICTM1608S01H1 kernel: [586649.657080] lpfc :af:00.1: 5:(0):6172 NVME rescanned DID x3d0a00 port_state x2 Apr 7 11:45:28 ICTM1608S01H1 kernel: [586649.657268] lpfc :18:00.1: 1:(0):6172 NVME rescanned DID x3d0a00 port_state x2 Apr 7 11:45:28 ICTM1608S01H1 kernel: [586649.658064] nvme nvme5: NVME-FC{4}: controller connectivity lost. Awaiting Reconnect Apr 7 11:45:28 ICTM1608S01H1 kernel: [586649.659036] nvme nvme1: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect Apr 7 11:45:28 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transport=fc\t--traddr=nn-0x200200a098d8580e:pn-0x202200a098d8580e\t--trsvcid=none\t--host-traddr=nn-0x2090fadcc5ce:pn-0x1090fadcc5ce.service' failed with exit code 1. Apr 7 11:45:28 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transport=fc\t--traddr=nn-0x200200a098d8580e:pn-0x202200a098d8580e\t--trsvcid=none\t--host-traddr=nn-0x20109b8f2b8e:pn-0x10109b8f2b8e.service' failed with exit code 1. Apr 7 11:45:28 ICTM1608S01H1 kernel: [586649.680671] nvme nvme5: NVME-FC{4}: io failed due to lldd error 6 Apr 7 11:45:28 ICTM1608S01H1 kernel: [586649.703918] nvme nvme1: NVME-FC{0}: io failed due to lldd error 6 Apr 7 11:45:29 ICTM1608S01H1 kernel: [586650.469693] lpfc :af:00.0: 4:(0):6172 NVME rescanned DID x011400 port_state x2 Apr 7 11:45:29 ICTM1608S01H1 kernel: [586650.469715] lpfc :18:00.0: 0:(0):6172 NVME rescanned DID x011400 port_state x2 Apr 7 11:45:29 ICTM1608S01H1 kernel: [586650.470629] nvme nvme4: NVME-FC{1}: controller connectivity lost. Awaiting Reconnect Apr 7 11:45:29 ICTM1608S01H1 kernel: [586650.471611] nvme nvme8: NVME-FC{5}: controller connectivity lost. Awaiting Reconnect Apr 7 11:45:29 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transport=fc\t--traddr=nn-0x200200a098d8580e:pn-0x201200a098d8580e\t--trsvcid=none\t--host-traddr=nn-0x2090fadcc5cd:pn-0x1090fadcc5cd.service' failed with exit code 1. Apr 7 11:45:29 ICTM1608S01H1 systemd-udevd[2895178]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transport=fc\t--traddr=nn-0x200200a098d8580e:pn-0x201200a098d8580e\t--trsvcid=none\t--host-traddr=nn-0x20109b8f2b8d:pn-0x10109b8f2b8d.service' failed with exit code 1. Apr 7 11:45:29 ICTM1608S01H1 kernel: [586650.493222] nvme nvme4: NVME-FC{1}: io failed due to lldd error 6 Apr 7 11:45:29 ICTM1608S01H1 kernel: [586650.516848] nvme nvme8: NVME-FC{5}: io failed due to lldd error 6 Apr 7 11:45:59 ICTM1608S01H1 kernel: [586680.663369] rport-10:0-9: blocked FC remote port time out: removing rport Apr 7 11:45:59 ICTM1608S01H1 kernel: [586680.663373] rport-16:0-9: blocked FC remote port time out: removing rport Apr 7 11:45:59 ICTM1608S01H1 kernel: [586680.663377] rport-15:0-9: blocked FC remote port time out: removing rport Apr 7 11:45:59 ICTM1608S01H1 kernel: [586680.663383] rport-12:0-9: blocked FC remote port time out: removing rport Apr 7 11:46:28 ICTM1608S01H1 kernel: [586709.847350] nvme nvme5: NVME-FC{4}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 7 11:46:28 ICTM1608S01H1 kernel: [586709.847363] nvme nvme5: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 7 11:46:28 ICTM1608S01H1 kernel: [586709.847385] nvme nvme1: NVME-FC{0}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 7 11:46:28 ICTM1608S01H1 kernel: [586709.847395] nvme nvme1: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 7 11:46:29 ICTM1608S01H1 kernel: [586710.615343] nvme nvme4: NVME-FC{1}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 7 11:46:29 ICTM1608S01H1 kernel: [586710.615357] nvme nvme4: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 7 11:46:29 ICTM1608S01H1 kernel: [586710.615375] nvme nvme8: NVME-FC{5}: dev_loss_tmo (60) expired while waiting for remoteport connectivity. Apr 7 11:46:29 ICTM1608S01H1 kernel: [586710.615389] nvme nvme8: Removing ctrl: NQN "nqn.1992-08.com.netapp:5700.600a098000d8580e5c0136a2" Apr 7 11:47:07 ICTM1608S01H1 systemd-udevd[2896874]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t--transport=fc\t--traddr=nn-0x200200a098d8580e:pn-0x201200a098d8580e\t--trsvcid=none\t--host-traddr=nn-0x2090fadcc5cd:pn-0x1090fadcc5cd.service' failed with exit code 1. Apr 7 11:47:07 ICTM1608S01H1 systemd-udevd[2896874]: fc_udev_device: Process 'systemctl --no-block start
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
> Dan, where can I find the location of these logs? they should be in /var/log/syslog, and/or the journal, which you can check with: for all logs: $ journalctl just logs for this boot: $ journalctl -b just logs for the nvmf-connect unit(s): $ journalctl -u 'nvmf-connect*' You probably only really need to check the last one, to get the logs specific to the nvmf-connect service failures. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
Dan, where can I find the location of these logs? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
** Changed in: nvme-cli (Ubuntu) Status: Expired => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
[Expired for nvme-cli (Ubuntu) because there has been no activity for 60 days.] ** Changed in: nvme-cli (Ubuntu) Status: Incomplete => Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
You'll need to check the specific output/log of the nvmf-connect@ services that are failing, e.g.: Apr 21 16:48:53 ICTM1610S01H2 systemd-udevd[2946]: fc_udev_device: Process 'systemctl --no-block start nvmf-connect@--device=none\t-- transport=fc\t--traddr=nn-0x200400a098d85eb4:pn-0x202400a098d85eb4\t-- trsvcid=none\t--host-traddr=nn-0x2024ff1877fb:pn- 0x2124ff1877fb.service' failed with exit code 1. ** Changed in: nvme-cli (Ubuntu) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: nvme-cli (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
I am still seeing this with Ubuntu 20.04 LTS -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
Also, it does not look like Broadcom's website has an autoconnect script that supports Ubuntu. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1874270] Re: NVMe/FC connections fail to reestablish after controller is reset
I've attached logs. ** Attachment added: "ICTM1610S01-messages-4-21-20.zip" https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+attachment/5357988/+files/ICTM1610S01-messages-4-21-20.zip -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1874270 Title: NVMe/FC connections fail to reestablish after controller is reset To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1874270/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs