[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 Warner Losh changed: What|Removed |Added Priority|--- |Normal Assignee|b...@freebsd.org|i...@freebsd.org Status|New |Open -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #13 from Warner Losh --- I understand what I must do next, but haven't had the time to do the changes for the next round of testing because of time off around the holidays, travel, etc. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #11 from Alexandr Krivulya --- With kernel patched from nvme-suspend branch I have a new messages in log: Dec 15 14:50:41 thinkpad kernel: pm_runtime_get_if_in_use not implemented -- see your local kernel hacker Dec 15 14:50:41 thinkpad kernel: sched_setscheduler_nocheck not implemented -- see your local kernel hacker Dec 15 14:50:41 thinkpad kernel: register_oom_notifier not implemented -- see your local kernel hacker Dec 15 14:50:41 thinkpad kernel: [drm] Initialized i915 1.6.0 20171222 for drmn0 on minor 0 Dec 15 14:50:41 thinkpad kernel: register_acpi_notifier not implemented -- see your local kernel hacker Dec 15 14:50:41 thinkpad kernel: async_schedule is dodgy -- see your local kernel hacker Dec 15 14:50:41 thinkpad kernel: pm_runtime_set_autosuspend_delay not implemented -- see your local kernel hacker Dec 15 14:50:41 thinkpad kernel: __pm_runtime_use_autosuspend not implemented -- see your local kernel hacker Dec 15 14:50:41 thinkpad kernel: async_synchronize_cookie not implemented -- see your local kernel hacker Also kernel panicked while suspending: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0x4d8 fault code = supervisor write data, page not present instruction pointer = 0x20:0x807ca21d stack pointer = 0x28:0xfe4857a0 frame pointer = 0x28:0xfe4857c0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 13 (g_down) trap number = 12 panic: page fault cpuid = 3 time = 1544877759 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe485450 vpanic() at vpanic+0x1b4/frame 0xfe4854b0 panic() at panic+0x43/frame 0xfe485510 trap_fatal() at trap_fatal+0x369/frame 0xfe485560 trap_pfault() at trap_pfault+0x49/frame 0xfe4855c0 trap() at trap+0x29f/frame 0xfe4856d0 calltrap() at calltrap+0x8/frame 0xfe4856d0 --- trap 0xc, rip = 0x807ca21d, rsp = 0xfe4857a0, rbp = 0xfe4857c0 --- nvme_qpair_submit_request() at nvme_qpair_submit_request+0x2d/frame 0xfe4857c0 nvme_ctrlr_submit_io_request() at nvme_ctrlr_submit_io_request+0x2f/frame 0xfe4857e0 nvme_ns_cmd_deallocate() at nvme_ns_cmd_deallocate+0x84/frame 0xfe485820 nvme_ns_bio_process() at nvme_ns_bio_process+0x257/frame 0xfe4858a0 nvd_strategy() at nvd_strategy+0x4d/frame 0xfe4858d0 g_disk_start() at g_disk_start+0x360/frame 0xfe485920 g_io_schedule_down() at g_io_schedule_down+0x1a1/frame 0xfe485960 g_down_procbody() at g_down_procbody+0x6d/frame 0xfe485970 fork_exit() at fork_exit+0x83/frame 0xfe4859b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe4859b0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #10 from Warner Losh --- (In reply to Alexandr Krivulya from comment #6) Just rebuild the kernel and/or module. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #9 from mmata...@gmail.com --- I uploaded photos of what the kernel panic I get on resume. Hopefully that is useful. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #8 from mmata...@gmail.com --- Created attachment 200117 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=200117=edit Kernel Panic while resuming 2 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #7 from mmata...@gmail.com --- Created attachment 200116 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=200116=edit Kernel Panic while resuming -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #6 from Alexandr Krivulya --- Do I need to build a whole tree or just patch these files? sys/dev/nvme/nvme.c sys/dev/nvme/nvme_ctrlr.c sys/dev/nvme/nvme_private.h -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 --- Comment #5 from Warner Losh --- I have some code in my https://github.com/bsdimp/freebsd in the nvme-suspend branch you might want to test. I've not had time to test it out, so I'm looking for help testing it out. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 Alexandr Krivulya changed: What|Removed |Added CC||shur...@shurik.kiev.ua --- Comment #4 from Alexandr Krivulya --- I have not any data corruptions, but I prefer to shutdown system instead of suspend until we do not have any progress on it. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 ultrageran...@bleu255.com changed: What|Removed |Added CC||ultrageran...@bleu255.com --- Comment #3 from ultrageran...@bleu255.com --- I'm seeing the same on a Thinkpad X1 (2018) with 12.0-PRERELEASE (r341261) and following hardware: nvme0@pci0:4:0:0: class=0x010802 card=0xa801144d chip=0xa808144d rev=0x00 hdr=0x00 vendor = 'Samsung Electronics Co Ltd' device = 'NVMe SSD Controller SM981/PM981' class = mass storage subclass = NVM Not sure if related but I suffered an unpleasant data loss during one of these hanging resuming state. Is it reliable to use suspend while this issue is not solved? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 mmata...@gmail.com changed: What|Removed |Added CC||mmata...@gmail.com --- Comment #2 from mmata...@gmail.com --- I am experiencing the same problem however I will get, around 50% of the time, a kernel panic or the machine simply becomes unresponsive. I have seen errors saying something along the lines of the nvme device disappeared. Unfortunately because logs don't actually persist to disk since the disk is gone. I am running a new Dell XPS 13 and commit 338924 with drm-kmod-next -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 Eric van Gyzen changed: What|Removed |Added CC||vangy...@freebsd.org Severity|Affects Only Me |Affects Some People --- Comment #1 from Eric van Gyzen --- I get these on my Dell XPS 13 running head, too. Aug 12 20:35:51 hammy kernel: nvme0: Resetting controller due to a timeout. Aug 12 20:35:51 hammy kernel: nvme0: resetting controller Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: READ sqid:1 cid:103 nsid:1 lba:296718939 len:24 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:103 cdw0:0 Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: READ sqid:1 cid:106 nsid:1 lba:147433251 len:186 Aug 12 20:35:51 hammy kernel: nvme0: nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:106 cdw0:0 Aug 12 20:35:51 hammy kernel: async event occurred (type 0x0, info 0x00, page 0x01) Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: WRITE sqid:2 cid:125 nsid:1 lba:290334796 len:7 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:2 cid:125 cdw0:0 Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: DATASET MANAGEMENT sqid:3 cid:84 nsid:1 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:3 cid:84 cdw0:0 Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: DATASET MANAGEMENT sqid:3 cid:94 nsid:1 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:3 cid:94 cdw0:0 Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: READ sqid:4 cid:126 nsid:1 lba:261212309 len:7 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:4 cid:126 cdw0:0 Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: WRITE sqid:5 cid:87 nsid:1 lba:290332732 len:16 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:5 cid:87 cdw0:0 Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: READ sqid:5 cid:86 nsid:1 lba:107551651 len:4 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:5 cid:86 cdw0:0 Aug 12 20:35:51 hammy kernel: nvme0: aborting outstanding i/o Aug 12 20:35:51 hammy kernel: nvme0: READ sqid:6 cid:115 nsid:1 lba:312559547 len:5 Aug 12 20:35:51 hammy kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:6 cid:115 cdw0:0 Aug 12 20:35:51 hammy acpi[1696]: resumed at 20180812 20:35:51 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 230722] [nvme][resume] Delay and possible deadlock after resume
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230722 Bug ID: 230722 Summary: [nvme][resume] Delay and possible deadlock after resume Product: Base System Version: CURRENT Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: go...@freebsd.org CC: i...@freebsd.org, jimhar...@freebsd.org When resuming from S3 sleep any disk activity stops for around 30 seconds and kernel prints a bunch of error messages from NVME. Eventually, the system resumes normal operations. I also saw this scenario resulting in deadlock although can't reproduce it reliably anymore. Error messages printed by the kernel: nvme0: Resetting controller due to a timeout. nvme0: resetting controller nvme0: aborting outstanding i/o nvme0: READ sqid:1 cid:111 nsid:1 lba:98381272 len:8 nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:111 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:1 cid:124 nsid:1 lba:626180072 len:8 nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:124 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:2 cid:121 nsid:1 lba:617174512 len:40 nvme0: ABORTED - BY REQUEST (00/07) sqid:2 cid:121 cdw0:0 nvme0: aborting outstanding i/o nvme0: DATASET MANAGEMENT sqid:4 cid:94 nsid:1 nvme0: ABORTED - BY REQUEST (00/07) sqid:4 cid:94 cdw0:0 nvme0: aborting outstanding i/o nvme0: WRITE sqid:4 cid:93 nsid:1 lba:46557808 len:16 nvme0: ABORTED - BY REQUEST (00/07) sqid:4 cid:93 cdw0:0 nvme0: Missing interrupt Controller information: Controller Capabilities/Features Vendor ID: 17aa Subsystem Vendor ID: 17aa Serial Number: 1150592304774 Model Number:LENSE20512GMSP34MEAT2TA Firmware Version:1.9.8341 Recommended Arb Burst: 2 IEEE OUI Identifier: 99 32 a0 Multi-Path I/O Capabilities: Not Supported Max Data Transfer Size: 131072 Controller ID: 0x01 Version: 0.0.0 Admin Command Set Attributes Security Send/Receive: Supported Format NVM: Supported Firmware Activate/Download: Supported Namespace Managment: Not Supported Device Self-test:Not Supported Directives: Not Supported NVMe-MI Send/Receive:Not Supported Virtualization Management: Not Supported Doorbell Buffer Config Not Supported Abort Command Limit: 4 Async Event Request Limit: 4 Number of Firmware Slots:1 Firmware Slot 1 Read-Only: No Per-Namespace SMART Log: No Error Log Page Entries: 4 Number of Power States: 5 NVM Command Set Attributes == Submission Queue Entry Size Max: 64 Min: 64 Completion Queue Entry Size Max: 16 Min: 16 Number of Namespaces:1 Compare Command: Not Supported Write Uncorrectable Command: Supported Dataset Management Command: Supported Write Zeroes Command:Not Supported Save Features: Supported Reservations:Not Supported Timestamp feature: Not Supported Fused Operation Support: Not Supported Format NVM Attributes: Crypto Erase, Per-NS Erase, Per-NS Format Volatile Write Cache:Present -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"