Launchpad has imported 33 comments from the remote bug at https://bugs.freedesktop.org/show_bug.cgi?id=105811.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2018-03-29T17:26:07+00:00 Ahabig wrote: Created attachment 138431 A photo of some of the kernel spew. On my Fedora 27 Thinkpad T430, I have been unable to boot any of the 4.15.x kernels (4.14.18 and previous work just fine). The kernel crashes at the initial kms only 3s into the boot, before any logging to disk happens, making this one hard to provide information on. Adding "nomodeset" to the boot lets the boot process go further: but as soon as a buffer flush tries to write to the system disk, it hangs (so again, no syslog info beyond a string of ^@'s). kdump doesn't help: that needs to write to disk, and this bug seems to be taking out that ability as it flails. Complicating matters further is that the screen's backlight is turned off at the start of the kms (in a normal boot it comes back on later), rendering console messages hard to read. Luckily, I found out that with an external monitor, it will eventually (a minute or two later) light up, resulting in the attached screenshot which points to the i915 driver. The initial bug was posted at Fedora's bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1557416 and they redirected me here (delayed by travel, sorry). I see this Ubuntu bug, which shares a similar backtrace: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268 although my crash is not dependent on an external monitor. Another similar backtrace is on your site here: https://bugs.freedesktop.org/show_bug.cgi?id=104573 but that one doesn't seem fatal. Ideas? bios update to the laptop didn't change anything. xorg update didn't change anything. F27 is now on xorg-x11-drv-intel-2.99.917-31-20171025. All 4.15.x kernels they've put out do the same thing, the latest being 4.15.12-301 Thanks in advance - this has been bugging me for months, and all my usual debugging techniques have been stymied! Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/6 ------------------------------------------------------------------------ On 2018-03-30T12:27:18+00:00 Jani-saarinen-g wrote: HI, can you try with latest drm-tip (https://cgit.freedesktop.org/drm- tip and please add drm.debug=14 module parameter, and attach dmesg from boot to reproducing the problem. Thanks. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/8 ------------------------------------------------------------------------ On 2018-03-30T13:26:47+00:00 Ahabig wrote: Will try. Am not optimistic we'll get much, however, as the crash occurs before syslogd can write anything. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/9 ------------------------------------------------------------------------ On 2018-03-30T14:24:13+00:00 Ahabig wrote: Created attachment 138444 dmesg from initial boot into single user mode Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/10 ------------------------------------------------------------------------ On 2018-03-30T14:24:57+00:00 Ahabig wrote: Created attachment 138445 dmesg from boot into runlevel 3 after ntfs filesystems removed Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/11 ------------------------------------------------------------------------ On 2018-03-30T14:25:57+00:00 Ahabig wrote: Created attachment 138446 dmesg including logging after X started same boot as dmesg-booted.tx, but with some additional logging after X was started. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/12 ------------------------------------------------------------------------ On 2018-03-30T14:30:31+00:00 Ahabig wrote: ok, boot success! First try fell into single user mode due to ntfs filesystems the drm-tip kernel didn't like. (the first dmesg attachement). Commenting those out from fstab got me into runlevel three (the second dmesg attachment). X started just fine (I command line that using startx), additional debugging logs are appended (the 3rd dmesg file). Note that it didn't crash, so these logs, as happy as I am to get them, don't illuminate the problem (although they surely have lots of info about the intel chip in the system with the problem!). Potential reasons include: Maybe 4.16 fixed the problem; or maybe the drm-tip kernel doesn't include some offending but of code running in the Fedora kernel (eg, it obviously doesn't have ntfs support enabled). Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/13 ------------------------------------------------------------------------ On 2018-03-30T15:52:01+00:00 Ahabig wrote: Narrowed down the possibilities: just fished the 4.16.0-rc7 rpms for f28 out of koji: same crash, same lack of any entries into /var/log/messages whilst doing so. So, the problem isn't fixed in the newer kernel: it them must be something that's in the Fedora kernel and not in the drm- tip kernel. Ideas for how to narrow this down? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/14 ------------------------------------------------------------------------ On 2018-04-25T11:46:09+00:00 Jani-saarinen-g wrote: Jani, any advice here? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/15 ------------------------------------------------------------------------ On 2018-04-25T11:54:52+00:00 Jani-nikula wrote: [ 3.562059] bbswitch: loading out-of-tree module taints kernel. [ 3.562266] bbswitch: version 0.8 [ 3.562272] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.VID_ [ 3.562278] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG_.VID_ [ 3.562286] ACPI Warning: \_SB.PCI0.PEG.VID._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20180105/nsarguments-100) [ 3.562634] bbswitch: detected an Optimus _DSM function [ 3.562644] pci 0000:01:00.0: enabling device (0000 -> 0003) [ 3.562676] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on Can you disable the discrete card and run without out-of-tree stuff? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/16 ------------------------------------------------------------------------ On 2018-04-25T17:48:28+00:00 Ahabig wrote: Good idea Jani! Removed bumblebee and the proprietary nvidia driver. This didn't fix things, but it did allow me to go further before things locked up. Including actual system logging! Attached a dmesg output I got before things fell over. Upon booting, it seems the nouveau driver kms fails, then it falls over to i915. You can see nouveau initializing around 2.2 seconds in the log, then timing out at around 16 seconds. The first actual kernel crashes come later, starting with ieee80211 complaints. I was able to dmesg to a logfile and scp it away before things really unraveled. Hitting ctl-alt-del (all on console) induces a hard hang. blacklisting nouveau and rebuilding the initfs, it's back to the original problem of not even booting happily. Disabling the discrete card in bios does not change this behavior. The tail of its problems again shows up eventually on my second monitor, looking similar to the initial attachments where drm_atomic_helper is timing out. Wait... as I type this it eventually got to single user mode, second dmesg dump retrieved before it hang (again on ctl-alt-del). But, dumping that to a text file didn't make it to disk, the crash didn't flush the disk write buffer :( Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/17 ------------------------------------------------------------------------ On 2018-04-25T17:49:21+00:00 Ahabig wrote: Created attachment 139108 dmesg from boot with nouveau instead of proprietary driver. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/18 ------------------------------------------------------------------------ On 2018-04-25T18:07:44+00:00 Ahabig wrote: Slight correction - my first attempt to boot with discrete graphics enabled was foiled by there being a second bios setting that tries to autoset the graphics cards, despite me having just manually set it. REALLY forcing discrete graphics off got me back to the exact same "hang, doesn't write to syslog, but I could take a photo of it" problem as started this thread. Same kernel dump about ironlake and drm. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/19 ------------------------------------------------------------------------ On 2018-04-25T18:11:24+00:00 Ahabig wrote: ... and, to close the loop: 4.14.18 continues to work great (well, without bumblebee now, because it's uninstalled), as does windows 7 (dual-booted). Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/20 ------------------------------------------------------------------------ On 2018-04-26T07:07:40+00:00 Jani-nikula wrote: (In reply to Alec Habig from comment #11) > Created attachment 139108 [details] > dmesg from boot with nouveau instead of proprietary driver. Please add drm.debug=14 module parameter to get detailed debugging, and get the dmesg again. The attached dmesg doesn't seem to have anything out of the ordinary in graphics, the only warns come from wifi. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/21 ------------------------------------------------------------------------ On 2018-04-26T20:15:44+00:00 Ahabig wrote: ok. First, a note that yesterday and today's test were with kernel-4.15.17-300.fc27.x86_64 Couldn't get past a few seconds into the boot (the original failure mode) till I remembered that I'd taken the nouveau modules out of the initfs yesterday. Put it back, booted with drm.debug-14. It certainly seems that without nouveau taking the hit for whatever goes wrong in the initial kms, things are toast. nouveau times out, hands off to i915, which finishes the kms. However, if there's no nouveau in the kernel, i915 fails the initial mode swich entirely. Anyway, this boot went clear to runlevel 3 with no errors this time, so I logged on console as root, saved a dmesg (attached). Then exit'd from root's bash.... and things hung. A bit later these errors came to console: pr 26 14:47:24 entropy kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out Apr 26 14:47:24 entropy kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:pipe B] flip_done timed out after which I leaned on the power switch to shut down. Since it didn't go down cleanly, "journalctl -b -1" can't count its instances correctly. But, things did go to /var/log/messages this time, so the complete /var/log/messages for that session is also attached. To sync the two different time series, the dmesg ends around 14:44:44 in messages. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/22 ------------------------------------------------------------------------ On 2018-04-26T20:16:23+00:00 Ahabig wrote: Created attachment 139146 dmesg from drm.debug=14 clean boot (which then hung after this log was copied...) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/23 ------------------------------------------------------------------------ On 2018-04-26T20:17:21+00:00 Ahabig wrote: Created attachment 139147 /var/log/messages from that corresponding session, including an unfortunately sparse record of the actual hang. Last line in the dmesg corresponds to 14:44:44 in this full log. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/24 ------------------------------------------------------------------------ On 2018-04-27T06:22:51+00:00 Jani-saarinen-g wrote: There are similar issues seen on many bugs eg. on IVB: https://bugs.freedesktop.org/show_bug.cgi?id=104573. Ville, any idea for those? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/25 ------------------------------------------------------------------------ On 2018-04-27T09:30:13+00:00 Jani-nikula wrote: (In reply to Jani Saarinen from comment #18) > There are similar issues seen on many bugs eg. on IVB: > https://bugs.freedesktop.org/show_bug.cgi?id=104573. > Ville, any idea for those? Nothing similar about that AFAICT. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/26 ------------------------------------------------------------------------ On 2018-04-27T09:32:28+00:00 Jani-nikula wrote: I also don't see an actual crash or hang here anyway. Sure, the display may be toast, but per the logs systemd gracefully shuts down on power button press. Can you ssh into the machine? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/27 ------------------------------------------------------------------------ On 2018-04-27T09:39:12+00:00 Jani-nikula wrote: Finally, the Optimus stuff have always been a pain. Life suddenly feels too short when debugging issues with them. I know it sucks for you as a user, but please understand that they're really hard for us too. We only have control over one piece of the puzzle, without much clues of the whole. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/28 ------------------------------------------------------------------------ On 2018-04-27T14:07:57+00:00 Ahabig wrote: sshing in let me continue to work on shell in the ssh session, after the console hung upon exiting. dmesg reports continued "flip timeout" errors as kms flails. "shutdown -r now" at this point, although booting the ssh session promptly as expected, takes a few minutes to complete, seems to be waiting for the kms errors to complete. This clean boot did allow me to get a journalctl log of the whole process (will attach that). I understand that optimus is a nasty mess. But, frustratingly, things work just fine with the fedora 4.14.* series, and just fine with your debugging tree kernel. As a user I suppose I can always go back to the bad old days of hand installing kernels rather than just loading them from the yum repository. Last time I did that regularly was going on 20 years ago :) The fact that the errors happen with either fedora's 4.15.* tree or their 4.16 devel tree is telling us that something in the deltas they apply to their kernel build is the source of the problem. I suppose at this point, I go back to the original fedora bugzilla and put the ball back in their court? bisecting their configs and patches sounds like the next (tedious) step. Before I go, though: what are these "flip timeout" errors? If I had to guess, any number of things horribly wrong in there could make the i915 module get into a funny state that eventually throws this error: but any insight as I move forward would be appreciated. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/29 ------------------------------------------------------------------------ On 2018-04-27T14:08:40+00:00 Ahabig wrote: Created attachment 139175 full journalctl -b -1 -l dump obtained after being able to issue a clean reboot via sssh. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/30 ------------------------------------------------------------------------ On 2018-04-27T14:53:09+00:00 Ahabig wrote: I just asked the Fedora bugzilla about what patches they're putting on to their rpms, and they suggested: It's probably the opposite, there is something in the Free Desktop tree that hasn't yet made it into Linus' tree. If you can narrow down which patch it is we can bring it into Fedora. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/31 ------------------------------------------------------------------------ On 2018-05-04T12:26:58+00:00 Jani-saarinen-g wrote: Alec, any updates from you on this? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/32 ------------------------------------------------------------------------ On 2018-05-04T13:08:05+00:00 Ahabig wrote: No update since the Fedora suggestion above. How different is your development tree from the main kernel release? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/33 ------------------------------------------------------------------------ On 2018-05-04T13:50:00+00:00 Jani-saarinen-g wrote: Jani, like to comment or add something to https://01.org/linuxgraphics /gfx-docs/maintainer-tools/drm-intel.html? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/34 ------------------------------------------------------------------------ On 2018-05-10T15:55:33+00:00 Ahabig wrote: Interesting development here. I decided to try the latest drm-tip (4.17.0-rc4) and see what happens. It died: but looking at it harder, it seems my working drm-tip (4.16.0-rc7) had a much simpler .config file. I think the difference is I compiled the new one as root, which picked up all the config settings from my currently running fedora 4.14 kernel, while doing it as a normal user for 4.16 did not. Anyway - copying over the 4.16 .config into the 4.17 build area, and being conservative with all the new config optins the 4.17 build presents, it does boot again: although it does spend 30+seconds at the boot flailing: before succeeding rather than failing. So: something in fedora's kernel config is triggering my problem. So I can try bisecting kernel options till I find the specific troublemaker. There are a lot of them though. But: it's not just that. Hopefully the additional debugging info in the 4.17 dmesg (attached) offer clues. You can see the drm module having problems: I do not know the severity of the problems, just that it did the same "spend 30-odd seconds early in the boot timing out" problem as it does when it fails,even though it then succeeds (with a simpler .config). Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/35 ------------------------------------------------------------------------ On 2018-05-10T15:56:45+00:00 Ahabig wrote: Created attachment 139469 dmesg from a successful boot with drm-tip 4.17.0-rc4, and a simpler .config. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/36 ------------------------------------------------------------------------ On 2018-05-11T08:36:40+00:00 Jani-nikula wrote: Per logs you have selftests enabled: [ 0.377402] drm_mm: Testing DRM range manger (struct drm_mm), with random_seed=0x9292e37 max_iterations=8192 max_prime=128 [ 0.377756] drm_mm: igt_sanitycheck - ok! [ 0.377962] igt_debug 0x0000000000000000-0x0000000000000200: 512: free [ 0.377963] igt_debug 0x0000000000000200-0x0000000000000600: 1024: used [ 0.377963] igt_debug 0x0000000000000600-0x0000000000000a00: 1024: free [ 0.377964] igt_debug 0x0000000000000a00-0x0000000000000e00: 1024: used [ 0.377965] igt_debug 0x0000000000000e00-0x0000000000001000: 512: free [ 0.377966] igt_debug total: 4096, used 2048 free 2048 Use CONFIG_DRM_DEBUG_SELFTEST=n CONFIG_DRM_I915_DEBUG=n Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/37 ------------------------------------------------------------------------ On 2018-05-14T20:12:19+00:00 Ahabig wrote: Yes, that self-test thing was the delay early in boot with the simply- configured drm-tip kernel. Thanks! Working on testing various kernel configs to see which combination is the lethal one. This will take a while... (although less now that it boots faster :) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/38 ------------------------------------------------------------------------ On 2018-08-21T04:51:38+00:00 Btmckee9 wrote: I don't know if this will help, but I have an old Centrino i5 mobile running 4.18.1 as a media pc and if I connect an external monitor and boot I get a hard crash of the PC. I managed to get a complete dmesg just before the hang. This PC is intel only, not dual video. https://pastebin.com/raw/g5n2J90k I'm using this PC at the moment, on the external monitor which works fine if I plug it in after X has started. I'll try to keep tabs on this thread. This is a Gentoo install. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751268/comments/39 ** Bug watch added: freedesktop.org Bugzilla #104573 https://bugs.freedesktop.org/show_bug.cgi?id=104573 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1751268 Title: bionic desktop does not boot with external monitor attached - [drm:ironlake_crtc_enable [i915]] *ERROR* mode set failed: pipe A stuck / vblank wait timed out on crtc 1 To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1751268/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
