Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Wed, Nov 25, 2015 at 06:01:20AM +0100, Juergen Gross wrote: > On 24/11/15 23:46, Luis R. Rodriguez wrote: > > On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: > >> On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: > >>> Ok I will send the .config when I get back home. I have all kernels I > >>> build in .deb archive. The problem is that the debian kernel build > >>> procedure does not hold somewhere in the deb file the git commit hash. > >>> > >>> Fow which kernel would you care to see the config? 4.3? > >> > >> Doesn't really matter anymore. I've posted a patch already to fix it and > >> got the reply, that the fix is okay, but no harm can come from the > >> current implementation, as the two config options are always either both > >> set or reset. > > > > Hrm, Vassilis seems to be able to reproduce this more effectively by > > heating up > > his CPU prior to hibernation though. I have no idea what adding > > APIC_LVT_MASKED > > ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR > > 0x330) does but > > clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during > > boot." If we're suspending but the fan is still on I wonder if this could > > cause > > an issue with some settings the BIOS may have set prior to hibernation, and > > a mismatch upon resume. > > > > I can't find what APIC_LVT_MASKED does though, the best doc I found: > > http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf > > Local APIC (chapter 10.4). Thanks, yeah I only see the same thing you spotted and fixed [0] but also agree it does not play a role with this issue. Although completely not documented the APIC_LVT_MASKED just masks the thermal interrupts while we go down, and we just set the original value of the thermal register when we come up. The only other possible cautious reading about the thermal register seemed to be x86-32 bit specific. Let's see what the bisect ends up with. [0] https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=42baa2581c92f8d07e7260506c8d41caf14b0fc3 Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Wed, Nov 25, 2015 at 06:01:20AM +0100, Juergen Gross wrote: > On 24/11/15 23:46, Luis R. Rodriguez wrote: > > On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: > >> On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: > >>> Ok I will send the .config when I get back home. I have all kernels I > >>> build in .deb archive. The problem is that the debian kernel build > >>> procedure does not hold somewhere in the deb file the git commit hash. > >>> > >>> Fow which kernel would you care to see the config? 4.3? > >> > >> Doesn't really matter anymore. I've posted a patch already to fix it and > >> got the reply, that the fix is okay, but no harm can come from the > >> current implementation, as the two config options are always either both > >> set or reset. > > > > Hrm, Vassilis seems to be able to reproduce this more effectively by > > heating up > > his CPU prior to hibernation though. I have no idea what adding > > APIC_LVT_MASKED > > ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR > > 0x330) does but > > clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during > > boot." If we're suspending but the fan is still on I wonder if this could > > cause > > an issue with some settings the BIOS may have set prior to hibernation, and > > a mismatch upon resume. > > > > I can't find what APIC_LVT_MASKED does though, the best doc I found: > > http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf > > Local APIC (chapter 10.4). Thanks, yeah I only see the same thing you spotted and fixed [0] but also agree it does not play a role with this issue. Although completely not documented the APIC_LVT_MASKED just masks the thermal interrupts while we go down, and we just set the original value of the thermal register when we come up. The only other possible cautious reading about the thermal register seemed to be x86-32 bit specific. Let's see what the bisect ends up with. [0] https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=42baa2581c92f8d07e7260506c8d41caf14b0fc3 Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 24/11/15 23:46, Luis R. Rodriguez wrote: > On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: >> On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: >>> Ok I will send the .config when I get back home. I have all kernels I >>> build in .deb archive. The problem is that the debian kernel build >>> procedure does not hold somewhere in the deb file the git commit hash. >>> >>> Fow which kernel would you care to see the config? 4.3? >> >> Doesn't really matter anymore. I've posted a patch already to fix it and >> got the reply, that the fix is okay, but no harm can come from the >> current implementation, as the two config options are always either both >> set or reset. > > Hrm, Vassilis seems to be able to reproduce this more effectively by heating > up > his CPU prior to hibernation though. I have no idea what adding > APIC_LVT_MASKED > ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR > 0x330) does but > clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during > boot." If we're suspending but the fan is still on I wonder if this could > cause > an issue with some settings the BIOS may have set prior to hibernation, and > a mismatch upon resume. > > I can't find what APIC_LVT_MASKED does though, the best doc I found: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf Local APIC (chapter 10.4). Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: > On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: > > Ok I will send the .config when I get back home. I have all kernels I > > build in .deb archive. The problem is that the debian kernel build > > procedure does not hold somewhere in the deb file the git commit hash. > > > > Fow which kernel would you care to see the config? 4.3? > > Doesn't really matter anymore. I've posted a patch already to fix it and > got the reply, that the fix is okay, but no harm can come from the > current implementation, as the two config options are always either both > set or reset. Hrm, Vassilis seems to be able to reproduce this more effectively by heating up his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during boot." If we're suspending but the fan is still on I wonder if this could cause an issue with some settings the BIOS may have set prior to hibernation, and a mismatch upon resume. I can't find what APIC_LVT_MASKED does though, the best doc I found: https://www-ssl.intel.com/content/dam/www/public/us/en/documents/white-papers/cpu-monitoring-dts-peci-paper.pdf The inability to set the MTRR for the i915 card might be totally separate issue at this point, not sure. One could test that I suppose by just using vesa graphics card driver (disabling i915) to at least get a basic screen to see things and compile/test things. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Tue, Nov 24, 2015 at 01:01:31AM +0200, Vassilis Virvilis wrote: > On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: > >Its not clear from the log who called this MTRR call for WC that failed, I > >hope we didn't attempt a WC wright on a WB region. Who owns > >e000-efff ? > > How can I answer that? Is there any utility to run? peek inside /proc? > > [0.221012] pci :00:02.0: [8086:0412] type 00 class 0x03 > [0.221021] pci :00:02.0: reg 0x10: [mem 0xf780-0xf7bf 64bit] > [0.221025] pci :00:02.0: reg 0x18: [mem 0xe000-0xefff 64bit > pref] > [0.221028] pci :00:02.0: reg 0x20: [io 0xf000-0xf03f] ... > [0.453783] calling sysfb_init+0x0/0x96 @ 1 > [0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at > 0xe000, 0x6bb000 bytes, mapped to 0xc9000200 > [0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, > mode=1680x1050x32, linelength=6720 > [0.557233] Console: switching to colour frame buffer device 210x65 > [0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb > registered! > [0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs ... > [9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 > [9.745542] [drm] Memory usable by graphics device = 2048M > [9.745544] checking generic (e000 6bb000) vs hw (e000 1000) > [9.745544] fb: switching to inteldrmfb from simple ... > [9.943166] Console: switching to colour dummy device 80x25 > [9.943240] [drm] Replacing VGA console driver > [9.943520] mtrr: type mismatch for e000,1000 old: write-back new: > write-combining > [9.943526] Failed to add WC MTRR for [e000-efff]; > performance may suffer. > [9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [9.949728] [drm] Driver supports precise vblank timestamp query. > [9.949801] vgaarb: device changed decodes: > PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem ... > $lspci | grep 00:02.0 > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen > Core Processor Integrated Graphics Controller (rev 06) > > Looks like it is the graphics card or the graphics driver. Good job yes. > I don't know if this is relevant > $ cat /proc/mtrr > reg00: base=0x0 (0MB), size=16384MB, count=1: write-back > reg01: base=0x4 (16384MB), size= 512MB, count=1: write-back > reg02: base=0x0e000 ( 3584MB), size= 512MB, count=1: uncachable Right so it tried to set this to WC but failed, and when using PAT MTRR is not used instead PAT is used and your log showed no error. > reg03: base=0x0d000 ( 3328MB), size= 256MB, count=1: uncachable > reg04: base=0x0cf00 ( 3312MB), size= 16MB, count=1: uncachable > reg05: base=0x41f00 (16880MB), size= 16MB, count=1: uncachable > reg06: base=0x41ee0 (16878MB), size=2MB, count=1: uncachable > > > > >What does your log show right before and after this? To find out try: > > > >dmesg | grep -5 -i mtrr > > > > See full dmesg attached > > $dmesg | grep -5 -i mtrr > [0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs > [0.189336] calling pt_init+0x0/0x2a4 @ 1 > [0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs > [0.189352] calling bts_init+0x0/0xa4 @ 1 > [0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs > [0.189357] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189362] calling ffh_cstate_init+0x0/0x26 @ 1 > [0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs > [0.189366] calling activate_jump_labels+0x0/0x2d @ 1 > [0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs > [0.189370] calling kcmp_cookies_init+0x0/0x31 @ 1 > -- > [0.189424] calling dmi_id_init+0x0/0x300 @ 1 > [0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs > [0.189450] calling pci_arch_init+0x0/0x63 @ 1 > [0.189458] PCI: MMCONFIG for domain [bus 00-3f] at [mem > 0xf800-0xfbff] (base 0xf800) > [0.189462] PCI: MMCONFIG at [mem 0xf800-0xfbff] reserved in E820 > [0.189467] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with > a huge-page mapping due to MTRR override. > [0.189514] PCI: Using configuration type 1 for base access > [0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs > [0.189528] calling init_vdso+0x0/0x44 @ 1 > [0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs > [0.189538] calling sysenter_setup+0x0/0x52 @ 1 > -- > [0.189542] calling topology_init+0x0/0x83 @ 1 > [0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs > [0.189798] calling fixup_ht_bug+0x0/0xed @ 1 > [0.189799] perf_event_intel: PMU erratum BJ122, BV98,
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Tue, Nov 24, 2015 at 11:36:54AM +0200, vas...@iit.demokritos.gr wrote: > > Let's try to speed up reproducing this. > > > > I have a hunch perhaps this might be related to some BIOS controlled > > MTRRs and a mismatch which then enables the kernel to think that a type > > of MTRR write might be OK, but in fact its not. Due to the work load > > description of this perhaps this could be related to fan control and BIOS > > control on them and against some other device MTRR. More on this suspicion > > on another thread where you provide more logs. > > > > On a kernel that you know fails can you try replacing this work load by > > making > > you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building > > for 2, > > 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if > > making the CPU fan trigger would accelerate the issue. If 'make -j' is > > too nuts > > to the point you can't even CTRL C it, try 'make -j 16' . Note that if > > this is > > true then that means a hot CPU could still trigger CPU fan controls on on > > a > > fresh boot if the previous boot was CPU intensive. > > OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to > reproduce it in the second hibernate/resume cycle. Great, glad we could reduce the amount of time to reproduce to what seems to be a few minutes now. > Here is what I did in my own words so you can spot inconsistencies. > > I started a kernel compile with make -j 32. My computer was very > responsive which is an impressive feat by the way. > In a second tab in my Konsole (I am running KDE) I run $watch sensors. I > watched the temperature of the cores to go from 38 to ~70 and the cpu fan > from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the > compilation and hibernated from the KDE. I always hibernate from the KDE > start menu. Previously I had made some tests where I was hibernating from > the VT console (although sddm may was running in VT7) and I have managed > to reproduce it - so (in my mind) it was not graphics mode specific. From > that point I am always hibernating from KDE. Come to think of it, the mtrr_add() and/or ioremap_wc() calls would be triggered on driver initialization, that is on probe / boot time, so if this issue you are running into is a clash of the BIOS's own notion of what is set for an MTRR type and later another driver's desired MTRR desired type (or equivalent PAT type) then the issue could be triggered just with boot time / hibernation / resume time without much interaction at least on the graphics front. > The first time it worked. For the second time I thought - why to hit > Ctrl+C let's try to hibernate with the compilation running - and it > failed. OK. How long did you leave the machine on idle before resuming? Can you try on a fresh boot to bring up temperature to ~70 and while its still compiling hibernate and see if that triggers it ? If we can reduce it to only one hibernate that should reduce time to troubleshoot, it is also just puzzling you'd need to hibernate twice to reproduce this issue. > Now I don't know if it failed because it was the second cycle or > because the load of the compilation was there or because of the > temperature controlled fan register you mentioned. If its fan related one test could be to hibertane on a fresh boot once fan control is one, let it sit to cool, and then resume. Vs just resuming right away. Ie: determine if we need fan control to be idle upon resume or not, also how many times does fan control have to go on / off before you can reproduce. > Then I repeated the test with a known good kernel 3.18 (which should be > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs - > I have a problem there - see below) and it survived the same test > (hibernate two times with temperature being ~70). > > > > If this doesn't do it lets try forcing an MTRR capable driver, say > > graphics is > > the obvious target, try perhaps some 3D stuff or a screen saver prior to > > hibernation. Note that even if you boot nomtrr the BIOS may still use > > MTRRs, > > and PAT use on Linux could assume MTRR is not being used on drivers but > > the > > BIOS may still do something behind the scenes. This is actually one reason > > why > > we can't exactly remove MTRR support from Linux, since the BIOS may still > > do > > some wacky stuff with MTRRs, one example of such I was given was CPU can > > control might use WC MTRRs, so the kernel must be aware of this, even if > > no > > MTRRs are ever used on the Linux kernel at all -- this is the case now as > > of > > v4.3 and onwards. > > > > If that doesn't help speed it up , maybe try both screen saver + some 3D > > stuff + cpu instensive stuff. > > I have 3D effects enabled in my KDE. Since your tip succeed to reproduce > the problem early I didn't bother but If I should test 3D which program / > benchmark should I run? glxgears? As I mentioned above I can't think now of a reason why
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
> Let's try to speed up reproducing this. > > I have a hunch perhaps this might be related to some BIOS controlled > MTRRs and a mismatch which then enables the kernel to think that a type > of MTRR write might be OK, but in fact its not. Due to the work load > description of this perhaps this could be related to fan control and BIOS > control on them and against some other device MTRR. More on this suspicion > on another thread where you provide more logs. > > On a kernel that you know fails can you try replacing this work load by > making > you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building > for 2, > 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if > making the CPU fan trigger would accelerate the issue. If 'make -j' is > too nuts > to the point you can't even CTRL C it, try 'make -j 16' . Note that if > this is > true then that means a hot CPU could still trigger CPU fan controls on on > a > fresh boot if the previous boot was CPU intensive. OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to reproduce it in the second hibernate/resume cycle. Here is what I did in my own words so you can spot inconsistencies. I started a kernel compile with make -j 32. My computer was very responsive which is an impressive feat by the way. In a second tab in my Konsole (I am running KDE) I run $watch sensors. I watched the temperature of the cores to go from 38 to ~70 and the cpu fan from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the compilation and hibernated from the KDE. I always hibernate from the KDE start menu. Previously I had made some tests where I was hibernating from the VT console (although sddm may was running in VT7) and I have managed to reproduce it - so (in my mind) it was not graphics mode specific. From that point I am always hibernating from KDE. The first time it worked. For the second time I thought - why to hit Ctrl+C let's try to hibernate with the compilation running - and it failed. Now I don't know if it failed because it was the second cycle or because the load of the compilation was there or because of the temperature controlled fan register you mentioned. Then I repeated the test with a known good kernel 3.18 (which should be 773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs - I have a problem there - see below) and it survived the same test (hibernate two times with temperature being ~70). > If this doesn't do it lets try forcing an MTRR capable driver, say > graphics is > the obvious target, try perhaps some 3D stuff or a screen saver prior to > hibernation. Note that even if you boot nomtrr the BIOS may still use > MTRRs, > and PAT use on Linux could assume MTRR is not being used on drivers but > the > BIOS may still do something behind the scenes. This is actually one reason > why > we can't exactly remove MTRR support from Linux, since the BIOS may still > do > some wacky stuff with MTRRs, one example of such I was given was CPU can > control might use WC MTRRs, so the kernel must be aware of this, even if > no > MTRRs are ever used on the Linux kernel at all -- this is the case now as > of > v4.3 and onwards. > > If that doesn't help speed it up , maybe try both screen saver + some 3D > stuff + cpu instensive stuff. I have 3D effects enabled in my KDE. Since your tip succeed to reproduce the problem early I didn't bother but If I should test 3D which program / benchmark should I run? glxgears? > > To help you speed up testing you can try reducing your build time by > reducing > the amount of crap you have to build: > > make localmodconfig > > That should only build things your kernel has loaded as modules or is > already > enabled (=y). > Thanks for the tip. I don't want to change that right now. I don't mind waiting a little bit because I a get a deb with the kernel and can retest a known configuration. The other tip you gave if it actually works as it looks like working would give a great boost to the debugging cycle to actually make me the bottleneck. > > That is commit a023748d53c10850650fe86b1c4a7d421d576451 > ("Merge branch 'x86-mm-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") > > Git is smart enough to tell you you've hit a merge commit and that all the > possible commits on that merge could be the issue. This is why you bisect > log shows a slew of commits. The next step is to bisect through the merge > and then bisect through that, this will then let us identify the exact > commit > that may have caused the issue. > > There are a few ways to do this, my preferred way is to "unfold" a merge > commit manually. > > To help keep thing separately (without affecting other tests you might > have on your other git tree and to avoid having to force you to loose > fresh object as you continue to build test on the other tree), I'd do > something like this: we will go with your preferred way - no question about that. > > mkdir ~/tmp > git
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Tue, Nov 24, 2015 at 01:01:31AM +0200, Vassilis Virvilis wrote: > On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: > >Its not clear from the log who called this MTRR call for WC that failed, I > >hope we didn't attempt a WC wright on a WB region. Who owns > >e000-efff ? > > How can I answer that? Is there any utility to run? peek inside /proc? > > [0.221012] pci :00:02.0: [8086:0412] type 00 class 0x03 > [0.221021] pci :00:02.0: reg 0x10: [mem 0xf780-0xf7bf 64bit] > [0.221025] pci :00:02.0: reg 0x18: [mem 0xe000-0xefff 64bit > pref] > [0.221028] pci :00:02.0: reg 0x20: [io 0xf000-0xf03f] ... > [0.453783] calling sysfb_init+0x0/0x96 @ 1 > [0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at > 0xe000, 0x6bb000 bytes, mapped to 0xc9000200 > [0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, > mode=1680x1050x32, linelength=6720 > [0.557233] Console: switching to colour frame buffer device 210x65 > [0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb > registered! > [0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs ... > [9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 > [9.745542] [drm] Memory usable by graphics device = 2048M > [9.745544] checking generic (e000 6bb000) vs hw (e000 1000) > [9.745544] fb: switching to inteldrmfb from simple ... > [9.943166] Console: switching to colour dummy device 80x25 > [9.943240] [drm] Replacing VGA console driver > [9.943520] mtrr: type mismatch for e000,1000 old: write-back new: > write-combining > [9.943526] Failed to add WC MTRR for [e000-efff]; > performance may suffer. > [9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [9.949728] [drm] Driver supports precise vblank timestamp query. > [9.949801] vgaarb: device changed decodes: > PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem ... > $lspci | grep 00:02.0 > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen > Core Processor Integrated Graphics Controller (rev 06) > > Looks like it is the graphics card or the graphics driver. Good job yes. > I don't know if this is relevant > $ cat /proc/mtrr > reg00: base=0x0 (0MB), size=16384MB, count=1: write-back > reg01: base=0x4 (16384MB), size= 512MB, count=1: write-back > reg02: base=0x0e000 ( 3584MB), size= 512MB, count=1: uncachable Right so it tried to set this to WC but failed, and when using PAT MTRR is not used instead PAT is used and your log showed no error. > reg03: base=0x0d000 ( 3328MB), size= 256MB, count=1: uncachable > reg04: base=0x0cf00 ( 3312MB), size= 16MB, count=1: uncachable > reg05: base=0x41f00 (16880MB), size= 16MB, count=1: uncachable > reg06: base=0x41ee0 (16878MB), size=2MB, count=1: uncachable > > > > >What does your log show right before and after this? To find out try: > > > >dmesg | grep -5 -i mtrr > > > > See full dmesg attached > > $dmesg | grep -5 -i mtrr > [0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs > [0.189336] calling pt_init+0x0/0x2a4 @ 1 > [0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs > [0.189352] calling bts_init+0x0/0xa4 @ 1 > [0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs > [0.189357] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189362] calling ffh_cstate_init+0x0/0x26 @ 1 > [0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs > [0.189366] calling activate_jump_labels+0x0/0x2d @ 1 > [0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs > [0.189370] calling kcmp_cookies_init+0x0/0x31 @ 1 > -- > [0.189424] calling dmi_id_init+0x0/0x300 @ 1 > [0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs > [0.189450] calling pci_arch_init+0x0/0x63 @ 1 > [0.189458] PCI: MMCONFIG for domain [bus 00-3f] at [mem > 0xf800-0xfbff] (base 0xf800) > [0.189462] PCI: MMCONFIG at [mem 0xf800-0xfbff] reserved in E820 > [0.189467] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with > a huge-page mapping due to MTRR override. > [0.189514] PCI: Using configuration type 1 for base access > [0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs > [0.189528] calling init_vdso+0x0/0x44 @ 1 > [0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs > [0.189538] calling sysenter_setup+0x0/0x52 @ 1 > -- > [0.189542] calling topology_init+0x0/0x83 @ 1 > [0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs > [0.189798] calling fixup_ht_bug+0x0/0xed @ 1 > [0.189799] perf_event_intel: PMU erratum BJ122, BV98,
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: > On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: > > Ok I will send the .config when I get back home. I have all kernels I > > build in .deb archive. The problem is that the debian kernel build > > procedure does not hold somewhere in the deb file the git commit hash. > > > > Fow which kernel would you care to see the config? 4.3? > > Doesn't really matter anymore. I've posted a patch already to fix it and > got the reply, that the fix is okay, but no harm can come from the > current implementation, as the two config options are always either both > set or reset. Hrm, Vassilis seems to be able to reproduce this more effectively by heating up his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during boot." If we're suspending but the fan is still on I wonder if this could cause an issue with some settings the BIOS may have set prior to hibernation, and a mismatch upon resume. I can't find what APIC_LVT_MASKED does though, the best doc I found: https://www-ssl.intel.com/content/dam/www/public/us/en/documents/white-papers/cpu-monitoring-dts-peci-paper.pdf The inability to set the MTRR for the i915 card might be totally separate issue at this point, not sure. One could test that I suppose by just using vesa graphics card driver (disabling i915) to at least get a basic screen to see things and compile/test things. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Tue, Nov 24, 2015 at 11:36:54AM +0200, vas...@iit.demokritos.gr wrote: > > Let's try to speed up reproducing this. > > > > I have a hunch perhaps this might be related to some BIOS controlled > > MTRRs and a mismatch which then enables the kernel to think that a type > > of MTRR write might be OK, but in fact its not. Due to the work load > > description of this perhaps this could be related to fan control and BIOS > > control on them and against some other device MTRR. More on this suspicion > > on another thread where you provide more logs. > > > > On a kernel that you know fails can you try replacing this work load by > > making > > you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building > > for 2, > > 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if > > making the CPU fan trigger would accelerate the issue. If 'make -j' is > > too nuts > > to the point you can't even CTRL C it, try 'make -j 16' . Note that if > > this is > > true then that means a hot CPU could still trigger CPU fan controls on on > > a > > fresh boot if the previous boot was CPU intensive. > > OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to > reproduce it in the second hibernate/resume cycle. Great, glad we could reduce the amount of time to reproduce to what seems to be a few minutes now. > Here is what I did in my own words so you can spot inconsistencies. > > I started a kernel compile with make -j 32. My computer was very > responsive which is an impressive feat by the way. > In a second tab in my Konsole (I am running KDE) I run $watch sensors. I > watched the temperature of the cores to go from 38 to ~70 and the cpu fan > from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the > compilation and hibernated from the KDE. I always hibernate from the KDE > start menu. Previously I had made some tests where I was hibernating from > the VT console (although sddm may was running in VT7) and I have managed > to reproduce it - so (in my mind) it was not graphics mode specific. From > that point I am always hibernating from KDE. Come to think of it, the mtrr_add() and/or ioremap_wc() calls would be triggered on driver initialization, that is on probe / boot time, so if this issue you are running into is a clash of the BIOS's own notion of what is set for an MTRR type and later another driver's desired MTRR desired type (or equivalent PAT type) then the issue could be triggered just with boot time / hibernation / resume time without much interaction at least on the graphics front. > The first time it worked. For the second time I thought - why to hit > Ctrl+C let's try to hibernate with the compilation running - and it > failed. OK. How long did you leave the machine on idle before resuming? Can you try on a fresh boot to bring up temperature to ~70 and while its still compiling hibernate and see if that triggers it ? If we can reduce it to only one hibernate that should reduce time to troubleshoot, it is also just puzzling you'd need to hibernate twice to reproduce this issue. > Now I don't know if it failed because it was the second cycle or > because the load of the compilation was there or because of the > temperature controlled fan register you mentioned. If its fan related one test could be to hibertane on a fresh boot once fan control is one, let it sit to cool, and then resume. Vs just resuming right away. Ie: determine if we need fan control to be idle upon resume or not, also how many times does fan control have to go on / off before you can reproduce. > Then I repeated the test with a known good kernel 3.18 (which should be > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs - > I have a problem there - see below) and it survived the same test > (hibernate two times with temperature being ~70). > > > > If this doesn't do it lets try forcing an MTRR capable driver, say > > graphics is > > the obvious target, try perhaps some 3D stuff or a screen saver prior to > > hibernation. Note that even if you boot nomtrr the BIOS may still use > > MTRRs, > > and PAT use on Linux could assume MTRR is not being used on drivers but > > the > > BIOS may still do something behind the scenes. This is actually one reason > > why > > we can't exactly remove MTRR support from Linux, since the BIOS may still > > do > > some wacky stuff with MTRRs, one example of such I was given was CPU can > > control might use WC MTRRs, so the kernel must be aware of this, even if > > no > > MTRRs are ever used on the Linux kernel at all -- this is the case now as > > of > > v4.3 and onwards. > > > > If that doesn't help speed it up , maybe try both screen saver + some 3D > > stuff + cpu instensive stuff. > > I have 3D effects enabled in my KDE. Since your tip succeed to reproduce > the problem early I didn't bother but If I should test 3D which program / > benchmark should I run? glxgears? As I mentioned above I can't think now of a reason why
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 24/11/15 23:46, Luis R. Rodriguez wrote: > On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote: >> On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: >>> Ok I will send the .config when I get back home. I have all kernels I >>> build in .deb archive. The problem is that the debian kernel build >>> procedure does not hold somewhere in the deb file the git commit hash. >>> >>> Fow which kernel would you care to see the config? 4.3? >> >> Doesn't really matter anymore. I've posted a patch already to fix it and >> got the reply, that the fix is okay, but no harm can come from the >> current implementation, as the two config options are always either both >> set or reset. > > Hrm, Vassilis seems to be able to reproduce this more effectively by heating > up > his CPU prior to hibernation though. I have no idea what adding > APIC_LVT_MASKED > ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR > 0x330) does but > clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during > boot." If we're suspending but the fan is still on I wonder if this could > cause > an issue with some settings the BIOS may have set prior to hibernation, and > a mismatch upon resume. > > I can't find what APIC_LVT_MASKED does though, the best doc I found: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf Local APIC (chapter 10.4). Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
> Let's try to speed up reproducing this. > > I have a hunch perhaps this might be related to some BIOS controlled > MTRRs and a mismatch which then enables the kernel to think that a type > of MTRR write might be OK, but in fact its not. Due to the work load > description of this perhaps this could be related to fan control and BIOS > control on them and against some other device MTRR. More on this suspicion > on another thread where you provide more logs. > > On a kernel that you know fails can you try replacing this work load by > making > you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building > for 2, > 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if > making the CPU fan trigger would accelerate the issue. If 'make -j' is > too nuts > to the point you can't even CTRL C it, try 'make -j 16' . Note that if > this is > true then that means a hot CPU could still trigger CPU fan controls on on > a > fresh boot if the previous boot was CPU intensive. OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to reproduce it in the second hibernate/resume cycle. Here is what I did in my own words so you can spot inconsistencies. I started a kernel compile with make -j 32. My computer was very responsive which is an impressive feat by the way. In a second tab in my Konsole (I am running KDE) I run $watch sensors. I watched the temperature of the cores to go from 38 to ~70 and the cpu fan from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the compilation and hibernated from the KDE. I always hibernate from the KDE start menu. Previously I had made some tests where I was hibernating from the VT console (although sddm may was running in VT7) and I have managed to reproduce it - so (in my mind) it was not graphics mode specific. From that point I am always hibernating from KDE. The first time it worked. For the second time I thought - why to hit Ctrl+C let's try to hibernate with the compilation running - and it failed. Now I don't know if it failed because it was the second cycle or because the load of the compilation was there or because of the temperature controlled fan register you mentioned. Then I repeated the test with a known good kernel 3.18 (which should be 773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs - I have a problem there - see below) and it survived the same test (hibernate two times with temperature being ~70). > If this doesn't do it lets try forcing an MTRR capable driver, say > graphics is > the obvious target, try perhaps some 3D stuff or a screen saver prior to > hibernation. Note that even if you boot nomtrr the BIOS may still use > MTRRs, > and PAT use on Linux could assume MTRR is not being used on drivers but > the > BIOS may still do something behind the scenes. This is actually one reason > why > we can't exactly remove MTRR support from Linux, since the BIOS may still > do > some wacky stuff with MTRRs, one example of such I was given was CPU can > control might use WC MTRRs, so the kernel must be aware of this, even if > no > MTRRs are ever used on the Linux kernel at all -- this is the case now as > of > v4.3 and onwards. > > If that doesn't help speed it up , maybe try both screen saver + some 3D > stuff + cpu instensive stuff. I have 3D effects enabled in my KDE. Since your tip succeed to reproduce the problem early I didn't bother but If I should test 3D which program / benchmark should I run? glxgears? > > To help you speed up testing you can try reducing your build time by > reducing > the amount of crap you have to build: > > make localmodconfig > > That should only build things your kernel has loaded as modules or is > already > enabled (=y). > Thanks for the tip. I don't want to change that right now. I don't mind waiting a little bit because I a get a deb with the kernel and can retest a known configuration. The other tip you gave if it actually works as it looks like working would give a great boost to the debugging cycle to actually make me the bottleneck. > > That is commit a023748d53c10850650fe86b1c4a7d421d576451 > ("Merge branch 'x86-mm-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") > > Git is smart enough to tell you you've hit a merge commit and that all the > possible commits on that merge could be the issue. This is why you bisect > log shows a slew of commits. The next step is to bisect through the merge > and then bisect through that, this will then let us identify the exact > commit > that may have caused the issue. > > There are a few ways to do this, my preferred way is to "unfold" a merge > commit manually. > > To help keep thing separately (without affecting other tests you might > have on your other git tree and to avoid having to force you to loose > fresh object as you continue to build test on the other tree), I'd do > something like this: we will go with your preferred way - no question about that. > > mkdir ~/tmp > git
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: Its not clear from the log who called this MTRR call for WC that failed, I hope we didn't attempt a WC wright on a WB region. Who owns e000-efff ? How can I answer that? Is there any utility to run? peek inside /proc? Here is an idea: $dmesg | grep -i -5 e000 [0.220941] pci_bus :00: root bus resource [mem 0x000e4000-0x000e7fff window] [0.220944] pci_bus :00: root bus resource [mem 0xdf20-0xfeaf window] [0.220950] pci :00:00.0: [8086:0c00] type 00 class 0x06 [0.221012] pci :00:02.0: [8086:0412] type 00 class 0x03 [0.221021] pci :00:02.0: reg 0x10: [mem 0xf780-0xf7bf 64bit] [0.221025] pci :00:02.0: reg 0x18: [mem 0xe000-0xefff 64bit pref] [0.221028] pci :00:02.0: reg 0x20: [io 0xf000-0xf03f] [0.221081] pci :00:03.0: [8086:0c0c] type 00 class 0x040300 [0.221089] pci :00:03.0: reg 0x10: [mem 0xf7c34000-0xf7c37fff 64bit] [0.221163] pci :00:14.0: [8086:8cb1] type 00 class 0x0c0330 [0.221184] pci :00:14.0: reg 0x10: [mem 0xf7c2-0xf7c2 64bit] -- [0.453765] calling ioapic_init_ops+0x0/0xf @ 1 [0.453767] initcall ioapic_init_ops+0x0/0xf returned 0 after 0 usecs [0.453770] calling add_pcspkr+0x0/0x3b @ 1 [0.453781] initcall add_pcspkr+0x0/0x3b returned 0 after 8 usecs [0.453783] calling sysfb_init+0x0/0x96 @ 1 [0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe000, 0x6bb000 bytes, mapped to 0xc9000200 [0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720 [0.557233] Console: switching to colour frame buffer device 210x65 [0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered! [0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs [0.661266] calling audit_classes_init+0x0/0xaa @ 1 -- [9.744397] input: gspca_zc3xx as /devices/pci:00/:00:14.0/usb3/3-3/input/input18 [9.744481] usbcore: registered new interface driver gspca_zc3xx [9.744484] initcall sd_driver_init+0x0/0x1000 [gspca_zc3xx] returned 0 after 319 usecs [9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 [9.745542] [drm] Memory usable by graphics device = 2048M [9.745544] checking generic (e000 6bb000) vs hw (e000 1000) [9.745544] fb: switching to inteldrmfb from simple [9.745831] calling alsa_seq_device_init+0x0/0x1000 [snd_seq_device] @ 384 [9.745842] initcall alsa_seq_device_init+0x0/0x1000 [snd_seq_device] returned 0 after 9 usecs [9.746179] calling hmac_module_init+0x0/0x1000 [hmac] @ 471 [9.746180] initcall hmac_module_init+0x0/0x1000 [hmac] returned 0 after 0 usecs -- [9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384 [9.751163] usbcore: registered new interface driver snd-usb-audio [9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs [9.943166] Console: switching to colour dummy device 80x25 [9.943240] [drm] Replacing VGA console driver [9.943520] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [9.943526] Failed to add WC MTRR for [e000-efff]; performance may suffer. [9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS [9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [9.949728] [drm] Driver supports precise vblank timestamp query. [9.949801] vgaarb: device changed decodes: PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) $lspci | grep 00:02.0 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) Looks like it is the graphics card or the graphics driver. I don't know if this is relevant $ cat /proc/mtrr reg00: base=0x0 (0MB), size=16384MB, count=1: write-back reg01: base=0x4 (16384MB), size= 512MB, count=1: write-back reg02: base=0x0e000 ( 3584MB), size= 512MB, count=1: uncachable reg03: base=0x0d000 ( 3328MB), size= 256MB, count=1: uncachable reg04: base=0x0cf00 ( 3312MB), size= 16MB, count=1: uncachable reg05: base=0x41f00 (16880MB), size= 16MB, count=1: uncachable reg06: base=0x41ee0 (16878MB), size=2MB, count=1: uncachable What does your log show right before and after this? To find out try: dmesg | grep -5 -i mtrr See full dmesg attached $dmesg | grep -5 -i mtrr [0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs [0.189336] calling pt_init+0x0/0x2a4 @ 1 [0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs [0.189352] calling bts_init+0x0/0xa4 @ 1 [0.189354] initcall bts_init+0x0/0xa4
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Sat, Nov 21, 2015 at 01:49:06PM +0200, Vassilis Virvilis wrote: > On 11/20/2015 02:23 PM, Juergen Gross wrote: > >On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: > >>>I've just found a potential issue: In case MTRR is disabled by the BIOS > >>>the PAT register of the boot processor won't be restored after resume. > >>> > >>>Can you check whether pr_info("MTRR: Disabled\n") has been executed in > >>>early boot? If yes, this might be a BIOS option. > >>> > >> > >>I don't have access right now. I will test it later tonight (This is my > >>home machine). > >> > >>Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr > >>somewere else e.g. /proc /sys etc? > > > >I think grepping for MTRR in dmesg should be enough. > > kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar > (see previously attached image) "Calling lapic..." place. > > $dmesg | grep -i mtr for 4.3 kernel with notpat > [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with > a huge-page mapping due to MTRR override. > [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs > [8.994140] mtrr: type mismatch for e000,1000 old: write-back new: > write-combining > [8.994154] Failed to add WC MTRR for [e000-efff]; > performance may suffer. Its not clear from the log who called this MTRR call for WC that failed, I hope we didn't attempt a WC wright on a WB region. Who owns e000-efff ? What does your log show right before and after this? To find out try: dmesg | grep -5 -i mtrr Not being able to use WC is not fatal, its just a performance issue, but if we tried to override a region which we should not have to WC for which another area the BIOS might rely on to not be WC, that could be a big issue. > $dmesg | grep -i mtr for 4.3 kernel with default pat enabled > [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with > a huge-page mapping due to MTRR override. > [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs The fact we don't see a conflict doesn't mean an issue or conflict didn't trigger. If PAT didn't see something the BIOS did that make the kernel assume it could do something that it was not able to. The MTRR init code should pick up on this stuff and let the kernel PAT code know if there could be a conflict, but if for some reason that was missed, that could be an issue. > I also checked my BIOS. I found nothing about mtrr. My BIOS manual is > ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about > MTRR? > > Question: If we assume your theory is correct about mtrr/pat, wouldn't > lockup/hang reboot every time the system goes to hibernate/resume? Can this > assumption explain why the first hibernation/resume cycles in rapid > succession after system boot are working and the long ones fail somewhat more > consistently? > > Note: With PAT enabled the system boots up significantly faster. > > In the weekend I will return to 3.18-rc2 and I will try to verify my > bisection is correct. Double guessing your self is a terrible thing... > > I will also try with nopat and I will run dmesg | grep -i mtr and post results > > Unless you have any other suggestions... Bisection on the merge commit would help. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Thu, Nov 19, 2015 at 06:39:28AM +0100, Juergen Gross wrote: > On 18/11/15 22:43, Vassilis Virvilis wrote: > > Hi, > > > > I have been hit by a hibernate/resume bug. Other people may have too: > > The following links are consistent with my observations > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494 > > https://bugs.archlinux.org/task/44807 > > > > Some observations: > > 1) The first few rapid hibernation / resume cycles do not fail. > > > > 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + > > thunderbird/icedove + Konsole) helps to reproduce and lock up during resume Let's try to speed up reproducing this. I have a hunch perhaps this might be related to some BIOS controlled MTRRs and a mismatch which then enables the kernel to think that a type of MTRR write might be OK, but in fact its not. Due to the work load description of this perhaps this could be related to fan control and BIOS control on them and against some other device MTRR. More on this suspicion on another thread where you provide more logs. On a kernel that you know fails can you try replacing this work load by making you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building for 2, 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if making the CPU fan trigger would accelerate the issue. If 'make -j' is too nuts to the point you can't even CTRL C it, try 'make -j 16' . Note that if this is true then that means a hot CPU could still trigger CPU fan controls on on a fresh boot if the previous boot was CPU intensive. If this doesn't do it lets try forcing an MTRR capable driver, say graphics is the obvious target, try perhaps some 3D stuff or a screen saver prior to hibernation. Note that even if you boot nomtrr the BIOS may still use MTRRs, and PAT use on Linux could assume MTRR is not being used on drivers but the BIOS may still do something behind the scenes. This is actually one reason why we can't exactly remove MTRR support from Linux, since the BIOS may still do some wacky stuff with MTRRs, one example of such I was given was CPU can control might use WC MTRRs, so the kernel must be aware of this, even if no MTRRs are ever used on the Linux kernel at all -- this is the case now as of v4.3 and onwards. If that doesn't help speed it up , maybe try both screen saver + some 3D stuff + cpu instensive stuff. To help you speed up testing you can try reducing your build time by reducing the amount of crap you have to build: make localmodconfig That should only build things your kernel has loaded as modules or is already enabled (=y). > > 3) Long hibernation times (overnight) helps to reproduce and lock up > > during resume > > > > 4) For the bad commits (where the lockup during resume takes place) - > > the image loading during resume is significantly faster. It is fast and > > then it locks. > > > > How I hit the problem and what I have done: > > > > I am running debian unstable > > > > Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. > > I upgraded diligently up to 4.2.6 - The problem persists > > > > I started kernel bisection from 3.16 to 3.19 following > > https://wiki.debian.org/DebianKernel/GitBisect > > > > One month and 25 kernels later see below for the bisect log > > Wow! Thanks for doing this work! > Vassilis, indeed, the amount of work you have put into this is extremely appreciated! > Juergen > > > > > I hit some untestable kernel that weren't booting. They were hanging at > > "Loading ramdisk..." before any actual kernel message. > > > > Looks like the first bad / untestable commit is from Juergen Gross / > > Thomas Gleixner Merge branch 'x86-mm-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support] > > That is commit a023748d53c10850650fe86b1c4a7d421d576451 ("Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") Git is smart enough to tell you you've hit a merge commit and that all the possible commits on that merge could be the issue. This is why you bisect log shows a slew of commits. The next step is to bisect through the merge and then bisect through that, this will then let us identify the exact commit that may have caused the issue. There are a few ways to do this, my preferred way is to "unfold" a merge commit manually. To help keep thing separately (without affecting other tests you might have on your other git tree and to avoid having to force you to loose fresh object as you continue to build test on the other tree), I'd do something like this: mkdir ~/tmp git clone ~/linux/.git linux-dev-test cd linux-dev-test Notice how if you do git log and search for a023748d53c10850650fe86b1c4a7d421d576451 you'll see that the commit listed before this is 773fed910d41e443e495a6bfa9ab1c2b7b13e012 ("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: > Ok I will send the .config when I get back home. I have all kernels I > build in .deb archive. The problem is that the debian kernel build > procedure does not hold somewhere in the deb file the git commit hash. > > Fow which kernel would you care to see the config? 4.3? Doesn't really matter anymore. I've posted a patch already to fix it and got the reply, that the fix is okay, but no harm can come from the current implementation, as the two config options are always either both set or reset. Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/20/2015 02:23 PM, Juergen Gross wrote: > > As the BIOS obviously isn't disabling MTRR I don't think we have > to go that route any longer. ok. >> >> In the weekend I will return to 3.18-rc2 and I will try to verify my >> bisection is correct. Double guessing your self is a terrible thing... > > Thanks. > >> I will also try with nopat and I will run dmesg | grep -i mtr and post >> results >> >> Unless you have any other suggestions... > I hit a very big problem here. I did $git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012 $make (with gcc 4.8 - as all my tests) and the resulting kernel in unbootable hunging in "Loading initial ramdisk..." second line of the kernel boot That means my bisection is not good because this release is marked as good. So now I am at loss. As I said I followed https://wiki.debian.org/DebianKernel/GitBisect I notice now that the article suggest a step $make oldconfig I did it once at the start of the bisection and then answering the default (Enter) in all config questions. > I think we have to find out where the kernel is really hanging. Do you > have any chance to trigger a NMI? I am googling about it. > > Looking into suspend/resume code I found a strange inconsistency for > the lapic handling: > > lapic_suspend() > { > ... > #ifdef CONFIG_X86_THERMAL_VECTOR > if (maxlvt >= 5) > apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); > #endif > ... > } > > lapic_resume() > { > ... > #if defined(CONFIG_X86_MCE_INTEL) > if (maxlvt >= 5) > apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); > #endif > ... > } > > and comparing that to: > > clear_local_APIC() > { > ... > #ifdef CONFIG_X86_THERMAL_VECTOR > if (maxlvt >= 5) { > v = apic_read(APIC_LVTTHMR); > apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); > } > #endif > #ifdef CONFIG_X86_MCE_INTEL > if (maxlvt >= 6) { > v = apic_read(APIC_LVTCMCI); > if (!(v & APIC_LVT_MASKED)) > apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); > } > #endif > ... > } > Ok I will send the .config when I get back home. I have all kernels I build in .deb archive. The problem is that the debian kernel build procedure does not hold somewhere in the deb file the git commit hash. Fow which kernel would you care to see the config? 4.3? Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Sat, Nov 21, 2015 at 01:49:06PM +0200, Vassilis Virvilis wrote: > On 11/20/2015 02:23 PM, Juergen Gross wrote: > >On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: > >>>I've just found a potential issue: In case MTRR is disabled by the BIOS > >>>the PAT register of the boot processor won't be restored after resume. > >>> > >>>Can you check whether pr_info("MTRR: Disabled\n") has been executed in > >>>early boot? If yes, this might be a BIOS option. > >>> > >> > >>I don't have access right now. I will test it later tonight (This is my > >>home machine). > >> > >>Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr > >>somewere else e.g. /proc /sys etc? > > > >I think grepping for MTRR in dmesg should be enough. > > kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar > (see previously attached image) "Calling lapic..." place. > > $dmesg | grep -i mtr for 4.3 kernel with notpat > [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with > a huge-page mapping due to MTRR override. > [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs > [8.994140] mtrr: type mismatch for e000,1000 old: write-back new: > write-combining > [8.994154] Failed to add WC MTRR for [e000-efff]; > performance may suffer. Its not clear from the log who called this MTRR call for WC that failed, I hope we didn't attempt a WC wright on a WB region. Who owns e000-efff ? What does your log show right before and after this? To find out try: dmesg | grep -5 -i mtrr Not being able to use WC is not fatal, its just a performance issue, but if we tried to override a region which we should not have to WC for which another area the BIOS might rely on to not be WC, that could be a big issue. > $dmesg | grep -i mtr for 4.3 kernel with default pat enabled > [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with > a huge-page mapping due to MTRR override. > [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs The fact we don't see a conflict doesn't mean an issue or conflict didn't trigger. If PAT didn't see something the BIOS did that make the kernel assume it could do something that it was not able to. The MTRR init code should pick up on this stuff and let the kernel PAT code know if there could be a conflict, but if for some reason that was missed, that could be an issue. > I also checked my BIOS. I found nothing about mtrr. My BIOS manual is > ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about > MTRR? > > Question: If we assume your theory is correct about mtrr/pat, wouldn't > lockup/hang reboot every time the system goes to hibernate/resume? Can this > assumption explain why the first hibernation/resume cycles in rapid > succession after system boot are working and the long ones fail somewhat more > consistently? > > Note: With PAT enabled the system boots up significantly faster. > > In the weekend I will return to 3.18-rc2 and I will try to verify my > bisection is correct. Double guessing your self is a terrible thing... > > I will also try with nopat and I will run dmesg | grep -i mtr and post results > > Unless you have any other suggestions... Bisection on the merge commit would help. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/20/2015 02:23 PM, Juergen Gross wrote: > > As the BIOS obviously isn't disabling MTRR I don't think we have > to go that route any longer. ok. >> >> In the weekend I will return to 3.18-rc2 and I will try to verify my >> bisection is correct. Double guessing your self is a terrible thing... > > Thanks. > >> I will also try with nopat and I will run dmesg | grep -i mtr and post >> results >> >> Unless you have any other suggestions... > I hit a very big problem here. I did $git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012 $make (with gcc 4.8 - as all my tests) and the resulting kernel in unbootable hunging in "Loading initial ramdisk..." second line of the kernel boot That means my bisection is not good because this release is marked as good. So now I am at loss. As I said I followed https://wiki.debian.org/DebianKernel/GitBisect I notice now that the article suggest a step $make oldconfig I did it once at the start of the bisection and then answering the default (Enter) in all config questions. > I think we have to find out where the kernel is really hanging. Do you > have any chance to trigger a NMI? I am googling about it. > > Looking into suspend/resume code I found a strange inconsistency for > the lapic handling: > > lapic_suspend() > { > ... > #ifdef CONFIG_X86_THERMAL_VECTOR > if (maxlvt >= 5) > apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); > #endif > ... > } > > lapic_resume() > { > ... > #if defined(CONFIG_X86_MCE_INTEL) > if (maxlvt >= 5) > apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); > #endif > ... > } > > and comparing that to: > > clear_local_APIC() > { > ... > #ifdef CONFIG_X86_THERMAL_VECTOR > if (maxlvt >= 5) { > v = apic_read(APIC_LVTTHMR); > apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); > } > #endif > #ifdef CONFIG_X86_MCE_INTEL > if (maxlvt >= 6) { > v = apic_read(APIC_LVTCMCI); > if (!(v & APIC_LVT_MASKED)) > apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); > } > #endif > ... > } > Ok I will send the .config when I get back home. I have all kernels I build in .deb archive. The problem is that the debian kernel build procedure does not hold somewhere in the deb file the git commit hash. Fow which kernel would you care to see the config? 4.3? Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 23/11/15 15:11, vas...@iit.demokritos.gr wrote: > Ok I will send the .config when I get back home. I have all kernels I > build in .deb archive. The problem is that the debian kernel build > procedure does not hold somewhere in the deb file the git commit hash. > > Fow which kernel would you care to see the config? 4.3? Doesn't really matter anymore. I've posted a patch already to fix it and got the reply, that the fix is okay, but no harm can come from the current implementation, as the two config options are always either both set or reset. Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: Its not clear from the log who called this MTRR call for WC that failed, I hope we didn't attempt a WC wright on a WB region. Who owns e000-efff ? How can I answer that? Is there any utility to run? peek inside /proc? Here is an idea: $dmesg | grep -i -5 e000 [0.220941] pci_bus :00: root bus resource [mem 0x000e4000-0x000e7fff window] [0.220944] pci_bus :00: root bus resource [mem 0xdf20-0xfeaf window] [0.220950] pci :00:00.0: [8086:0c00] type 00 class 0x06 [0.221012] pci :00:02.0: [8086:0412] type 00 class 0x03 [0.221021] pci :00:02.0: reg 0x10: [mem 0xf780-0xf7bf 64bit] [0.221025] pci :00:02.0: reg 0x18: [mem 0xe000-0xefff 64bit pref] [0.221028] pci :00:02.0: reg 0x20: [io 0xf000-0xf03f] [0.221081] pci :00:03.0: [8086:0c0c] type 00 class 0x040300 [0.221089] pci :00:03.0: reg 0x10: [mem 0xf7c34000-0xf7c37fff 64bit] [0.221163] pci :00:14.0: [8086:8cb1] type 00 class 0x0c0330 [0.221184] pci :00:14.0: reg 0x10: [mem 0xf7c2-0xf7c2 64bit] -- [0.453765] calling ioapic_init_ops+0x0/0xf @ 1 [0.453767] initcall ioapic_init_ops+0x0/0xf returned 0 after 0 usecs [0.453770] calling add_pcspkr+0x0/0x3b @ 1 [0.453781] initcall add_pcspkr+0x0/0x3b returned 0 after 8 usecs [0.453783] calling sysfb_init+0x0/0x96 @ 1 [0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe000, 0x6bb000 bytes, mapped to 0xc9000200 [0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720 [0.557233] Console: switching to colour frame buffer device 210x65 [0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered! [0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs [0.661266] calling audit_classes_init+0x0/0xaa @ 1 -- [9.744397] input: gspca_zc3xx as /devices/pci:00/:00:14.0/usb3/3-3/input/input18 [9.744481] usbcore: registered new interface driver gspca_zc3xx [9.744484] initcall sd_driver_init+0x0/0x1000 [gspca_zc3xx] returned 0 after 319 usecs [9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 [9.745542] [drm] Memory usable by graphics device = 2048M [9.745544] checking generic (e000 6bb000) vs hw (e000 1000) [9.745544] fb: switching to inteldrmfb from simple [9.745831] calling alsa_seq_device_init+0x0/0x1000 [snd_seq_device] @ 384 [9.745842] initcall alsa_seq_device_init+0x0/0x1000 [snd_seq_device] returned 0 after 9 usecs [9.746179] calling hmac_module_init+0x0/0x1000 [hmac] @ 471 [9.746180] initcall hmac_module_init+0x0/0x1000 [hmac] returned 0 after 0 usecs -- [9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384 [9.751163] usbcore: registered new interface driver snd-usb-audio [9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs [9.943166] Console: switching to colour dummy device 80x25 [9.943240] [drm] Replacing VGA console driver [9.943520] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [9.943526] Failed to add WC MTRR for [e000-efff]; performance may suffer. [9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS [9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [9.949728] [drm] Driver supports precise vblank timestamp query. [9.949801] vgaarb: device changed decodes: PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) $lspci | grep 00:02.0 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) Looks like it is the graphics card or the graphics driver. I don't know if this is relevant $ cat /proc/mtrr reg00: base=0x0 (0MB), size=16384MB, count=1: write-back reg01: base=0x4 (16384MB), size= 512MB, count=1: write-back reg02: base=0x0e000 ( 3584MB), size= 512MB, count=1: uncachable reg03: base=0x0d000 ( 3328MB), size= 256MB, count=1: uncachable reg04: base=0x0cf00 ( 3312MB), size= 16MB, count=1: uncachable reg05: base=0x41f00 (16880MB), size= 16MB, count=1: uncachable reg06: base=0x41ee0 (16878MB), size=2MB, count=1: uncachable What does your log show right before and after this? To find out try: dmesg | grep -5 -i mtrr See full dmesg attached $dmesg | grep -5 -i mtrr [0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs [0.189336] calling pt_init+0x0/0x2a4 @ 1 [0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs [0.189352] calling bts_init+0x0/0xa4 @ 1 [0.189354] initcall bts_init+0x0/0xa4
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On Thu, Nov 19, 2015 at 06:39:28AM +0100, Juergen Gross wrote: > On 18/11/15 22:43, Vassilis Virvilis wrote: > > Hi, > > > > I have been hit by a hibernate/resume bug. Other people may have too: > > The following links are consistent with my observations > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494 > > https://bugs.archlinux.org/task/44807 > > > > Some observations: > > 1) The first few rapid hibernation / resume cycles do not fail. > > > > 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + > > thunderbird/icedove + Konsole) helps to reproduce and lock up during resume Let's try to speed up reproducing this. I have a hunch perhaps this might be related to some BIOS controlled MTRRs and a mismatch which then enables the kernel to think that a type of MTRR write might be OK, but in fact its not. Due to the work load description of this perhaps this could be related to fan control and BIOS control on them and against some other device MTRR. More on this suspicion on another thread where you provide more logs. On a kernel that you know fails can you try replacing this work load by making you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building for 2, 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if making the CPU fan trigger would accelerate the issue. If 'make -j' is too nuts to the point you can't even CTRL C it, try 'make -j 16' . Note that if this is true then that means a hot CPU could still trigger CPU fan controls on on a fresh boot if the previous boot was CPU intensive. If this doesn't do it lets try forcing an MTRR capable driver, say graphics is the obvious target, try perhaps some 3D stuff or a screen saver prior to hibernation. Note that even if you boot nomtrr the BIOS may still use MTRRs, and PAT use on Linux could assume MTRR is not being used on drivers but the BIOS may still do something behind the scenes. This is actually one reason why we can't exactly remove MTRR support from Linux, since the BIOS may still do some wacky stuff with MTRRs, one example of such I was given was CPU can control might use WC MTRRs, so the kernel must be aware of this, even if no MTRRs are ever used on the Linux kernel at all -- this is the case now as of v4.3 and onwards. If that doesn't help speed it up , maybe try both screen saver + some 3D stuff + cpu instensive stuff. To help you speed up testing you can try reducing your build time by reducing the amount of crap you have to build: make localmodconfig That should only build things your kernel has loaded as modules or is already enabled (=y). > > 3) Long hibernation times (overnight) helps to reproduce and lock up > > during resume > > > > 4) For the bad commits (where the lockup during resume takes place) - > > the image loading during resume is significantly faster. It is fast and > > then it locks. > > > > How I hit the problem and what I have done: > > > > I am running debian unstable > > > > Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. > > I upgraded diligently up to 4.2.6 - The problem persists > > > > I started kernel bisection from 3.16 to 3.19 following > > https://wiki.debian.org/DebianKernel/GitBisect > > > > One month and 25 kernels later see below for the bisect log > > Wow! Thanks for doing this work! > Vassilis, indeed, the amount of work you have put into this is extremely appreciated! > Juergen > > > > > I hit some untestable kernel that weren't booting. They were hanging at > > "Loading ramdisk..." before any actual kernel message. > > > > Looks like the first bad / untestable commit is from Juergen Gross / > > Thomas Gleixner Merge branch 'x86-mm-for-linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support] > > That is commit a023748d53c10850650fe86b1c4a7d421d576451 ("Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") Git is smart enough to tell you you've hit a merge commit and that all the possible commits on that merge could be the issue. This is why you bisect log shows a slew of commits. The next step is to bisect through the merge and then bisect through that, this will then let us identify the exact commit that may have caused the issue. There are a few ways to do this, my preferred way is to "unfold" a merge commit manually. To help keep thing separately (without affecting other tests you might have on your other git tree and to avoid having to force you to loose fresh object as you continue to build test on the other tree), I'd do something like this: mkdir ~/tmp git clone ~/linux/.git linux-dev-test cd linux-dev-test Notice how if you do git log and search for a023748d53c10850650fe86b1c4a7d421d576451 you'll see that the commit listed before this is 773fed910d41e443e495a6bfa9ab1c2b7b13e012 ("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 21/11/15 12:49, Vassilis Virvilis wrote: > On 11/20/2015 02:23 PM, Juergen Gross wrote: >> On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. >>> >>> I don't have access right now. I will test it later tonight (This is my >>> home machine). >>> >>> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr >>> somewere else e.g. /proc /sys etc? >> >> I think grepping for MTRR in dmesg should be enough. > > kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the > familiar (see previously attached image) "Calling lapic..." place. > > $dmesg | grep -i mtr for 4.3 kernel with notpat > [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] > with a huge-page mapping due to MTRR override. > [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > [8.994140] mtrr: type mismatch for e000,1000 old: write-back > new: write-combining > [8.994154] Failed to add WC MTRR for > [e000-efff]; performance may suffer. > > $dmesg | grep -i mtr for 4.3 kernel with default pat enabled > [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] > with a huge-page mapping due to MTRR override. > [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > > > I also checked my BIOS. I found nothing about mtrr. My BIOS manual is > ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option > about MTRR? As the BIOS obviously isn't disabling MTRR I don't think we have to go that route any longer. > Question: If we assume your theory is correct about mtrr/pat, wouldn't > lockup/hang reboot every time the system goes to hibernate/resume? Can > this assumption explain why the first hibernation/resume cycles in rapid > succession after system boot are working and the long ones fail somewhat > more consistently? Hmm, I'm really not sure. It would depend on the usage of non-standard cache mode mappings. But as MTRR isn't disabled this theory won't apply to your problem. > Note: With PAT enabled the system boots up significantly faster. > > In the weekend I will return to 3.18-rc2 and I will try to verify my > bisection is correct. Double guessing your self is a terrible thing... Thanks. > I will also try with nopat and I will run dmesg | grep -i mtr and post > results > > Unless you have any other suggestions... I think we have to find out where the kernel is really hanging. Do you have any chance to trigger a NMI? Looking into suspend/resume code I found a strange inconsistency for the lapic handling: lapic_suspend() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); #endif ... } lapic_resume() { ... #if defined(CONFIG_X86_MCE_INTEL) if (maxlvt >= 5) apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); #endif ... } and comparing that to: clear_local_APIC() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) { v = apic_read(APIC_LVTTHMR); apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); } #endif #ifdef CONFIG_X86_MCE_INTEL if (maxlvt >= 6) { v = apic_read(APIC_LVTCMCI); if (!(v & APIC_LVT_MASKED)) apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); } #endif ... } I think it would be interesting to know your kernel config... Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 21/11/15 12:49, Vassilis Virvilis wrote: > On 11/20/2015 02:23 PM, Juergen Gross wrote: >> On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. >>> >>> I don't have access right now. I will test it later tonight (This is my >>> home machine). >>> >>> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr >>> somewere else e.g. /proc /sys etc? >> >> I think grepping for MTRR in dmesg should be enough. > > kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the > familiar (see previously attached image) "Calling lapic..." place. > > $dmesg | grep -i mtr for 4.3 kernel with notpat > [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] > with a huge-page mapping due to MTRR override. > [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > [8.994140] mtrr: type mismatch for e000,1000 old: write-back > new: write-combining > [8.994154] Failed to add WC MTRR for > [e000-efff]; performance may suffer. > > $dmesg | grep -i mtr for 4.3 kernel with default pat enabled > [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] > with a huge-page mapping due to MTRR override. > [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > > > I also checked my BIOS. I found nothing about mtrr. My BIOS manual is > ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option > about MTRR? As the BIOS obviously isn't disabling MTRR I don't think we have to go that route any longer. > Question: If we assume your theory is correct about mtrr/pat, wouldn't > lockup/hang reboot every time the system goes to hibernate/resume? Can > this assumption explain why the first hibernation/resume cycles in rapid > succession after system boot are working and the long ones fail somewhat > more consistently? Hmm, I'm really not sure. It would depend on the usage of non-standard cache mode mappings. But as MTRR isn't disabled this theory won't apply to your problem. > Note: With PAT enabled the system boots up significantly faster. > > In the weekend I will return to 3.18-rc2 and I will try to verify my > bisection is correct. Double guessing your self is a terrible thing... Thanks. > I will also try with nopat and I will run dmesg | grep -i mtr and post > results > > Unless you have any other suggestions... I think we have to find out where the kernel is really hanging. Do you have any chance to trigger a NMI? Looking into suspend/resume code I found a strange inconsistency for the lapic handling: lapic_suspend() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); #endif ... } lapic_resume() { ... #if defined(CONFIG_X86_MCE_INTEL) if (maxlvt >= 5) apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); #endif ... } and comparing that to: clear_local_APIC() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) { v = apic_read(APIC_LVTTHMR); apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); } #endif #ifdef CONFIG_X86_MCE_INTEL if (maxlvt >= 6) { v = apic_read(APIC_LVTCMCI); if (!(v & APIC_LVT_MASKED)) apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); } #endif ... } I think it would be interesting to know your kernel config... Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/20/2015 02:23 PM, Juergen Gross wrote: On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. I don't have access right now. I will test it later tonight (This is my home machine). Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr somewere else e.g. /proc /sys etc? I think grepping for MTRR in dmesg should be enough. kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place. $dmesg | grep -i mtr for 4.3 kernel with notpat [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs [8.994140] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [8.994154] Failed to add WC MTRR for [e000-efff]; performance may suffer. $dmesg | grep -i mtr for 4.3 kernel with default pat enabled [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR? Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently? Note: With PAT enabled the system boots up significantly faster. In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing... I will also try with nopat and I will run dmesg | grep -i mtr and post results Unless you have any other suggestions... Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/20/2015 02:23 PM, Juergen Gross wrote: On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. I don't have access right now. I will test it later tonight (This is my home machine). Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr somewere else e.g. /proc /sys etc? I think grepping for MTRR in dmesg should be enough. kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place. $dmesg | grep -i mtr for 4.3 kernel with notpat [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs [8.994140] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [8.994154] Failed to add WC MTRR for [e000-efff]; performance may suffer. $dmesg | grep -i mtr for 4.3 kernel with default pat enabled [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR? Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently? Note: With PAT enabled the system boots up significantly faster. In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing... I will also try with nopat and I will run dmesg | grep -i mtr and post results Unless you have any other suggestions... Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: >> I've just found a potential issue: In case MTRR is disabled by the BIOS >> the PAT register of the boot processor won't be restored after resume. >> >> Can you check whether pr_info("MTRR: Disabled\n") has been executed in >> early boot? If yes, this might be a BIOS option. >> > > I don't have access right now. I will test it later tonight (This is my > home machine). > > Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr > somewere else e.g. /proc /sys etc? I think grepping for MTRR in dmesg should be enough. Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
> I've just found a potential issue: In case MTRR is disabled by the BIOS > the PAT register of the boot processor won't be restored after resume. > > Can you check whether pr_info("MTRR: Disabled\n") has been executed in > early boot? If yes, this might be a BIOS option. > I don't have access right now. I will test it later tonight (This is my home machine). Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr somewere else e.g. /proc /sys etc? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 20/11/15 06:25, Vassilis Virvilis wrote: > On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: >> >> I compiled and I am running 4.3 right now. >> > > It failed this morning. Last night I did 3 hibernate / resume cycles. In > the last one I I also turned off the PSU (this seems to push it over the > edge - but it may be random behavior) and it worked. This morning 7h > later failed to resume - but it didn't hang on _lapic_resume. This time > it rebooted - and I seem to recall this behavior for 4.2+ kernels. I > forgot to mention it because my testing with 4.x kernels were one month > before. > > So 4.3 kernel - reboots on resume after a long hibernation time. > > I am testing with 4.3 and nopat right now. I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: >> I've just found a potential issue: In case MTRR is disabled by the BIOS >> the PAT register of the boot processor won't be restored after resume. >> >> Can you check whether pr_info("MTRR: Disabled\n") has been executed in >> early boot? If yes, this might be a BIOS option. >> > > I don't have access right now. I will test it later tonight (This is my > home machine). > > Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr > somewere else e.g. /proc /sys etc? I think grepping for MTRR in dmesg should be enough. Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
> I've just found a potential issue: In case MTRR is disabled by the BIOS > the PAT register of the boot processor won't be restored after resume. > > Can you check whether pr_info("MTRR: Disabled\n") has been executed in > early boot? If yes, this might be a BIOS option. > I don't have access right now. I will test it later tonight (This is my home machine). Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr somewere else e.g. /proc /sys etc? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 20/11/15 06:25, Vassilis Virvilis wrote: > On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: >> >> I compiled and I am running 4.3 right now. >> > > It failed this morning. Last night I did 3 hibernate / resume cycles. In > the last one I I also turned off the PSU (this seems to push it over the > edge - but it may be random behavior) and it worked. This morning 7h > later failed to resume - but it didn't hang on _lapic_resume. This time > it rebooted - and I seem to recall this behavior for 4.2+ kernels. I > forgot to mention it because my testing with 4.x kernels were one month > before. > > So 4.3 kernel - reboots on resume after a long hibernation time. > > I am testing with 4.3 and nopat right now. I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: I compiled and I am running 4.3 right now. It failed this morning. Last night I did 3 hibernate / resume cycles. In the last one I I also turned off the PSU (this seems to push it over the edge - but it may be random behavior) and it worked. This morning 7h later failed to resume - but it didn't hang on _lapic_resume. This time it rebooted - and I seem to recall this behavior for 4.2+ kernels. I forgot to mention it because my testing with 4.x kernels were one month before. So 4.3 kernel - reboots on resume after a long hibernation time. I am testing with 4.3 and nopat right now. Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 11:10 AM, Juergen Gross wrote: So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume 4.3 for now. I think 4.3 is okay. I will do it later tonight. It will take 2 days at least to report back I compiled and I am running 4.3 right now. If it fails I will try with the nopat option. If it fails I will try 3.18-rc2+nopat to see if that fails. Do you want me to run something on this like lspci, lsusb Yes, please post the output of both. Here they are. See attachments I would like this to be fixed so I am willing to do the testing. I appreciate this spirit. :-) I appreciate the guidance. :-) Vassilis Bus 004 Device 002: ID 8087:8001 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8009 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 002: ID 8087:8001 Intel Corp. Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize064 idVendor 0x8087 Intel Corp. idProduct 0x8001 bcdDevice0.00 iManufacturer 0 iProduct0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 12 Hub Descriptor: bLength 11 bDescriptorType 41 nNbrPorts 8 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood0 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable0x00 0x00 PortPwrCtrlMask0xff 0xff Hub Port Status: Port 1: .0100 power Port 2: .0100 power Port 3: .0100 power Port 4: .0100 power Port 5: .0100 power Port 6: .0100 power Port 7: .0100 power Port 8: .0100 power Device Qualifier (for other device speed): bLength10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 bNumConfigurations 1 Device Status: 0x0001 Self Powered Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 idVendor 0x1d6b Linux Foundation idProduct 0x0002 2.0 root hub bcdDevice4.03 iManufacturer 3 Linux 4.3.0+ ehci_hcd iProduct2 EHCI Host Controller iSerial 1 :00:1d.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 19/11/15 08:50, vas...@iit.demokritos.gr wrote: > Hi, > > Thanks for the quick answer > >> >> Could you please try the most recent 4.3 kernel? There has been some >> work related to this topic after 4.2 (large page pat handling done by >> Toshi Kani and mtrr/pat handling by Luis Rodriguez). > > That means I will reset the bisection. Right? Is there any other info we > can extract from there? I don't see what else should be specific to that patch other than the information that the issue occurred due to that patch. All further diagnostic information should be obtainable with a newer kernel, too. > So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume > 4.3 for now. I think 4.3 is okay. > I will do it later tonight. It will take 2 days at least to report back Okay, thank you for your effort! > >> >> Another interesting information would be the exact hardware you are >> using. Maybe we can see some similarities between yours and the other >> two cases you referenced above. >> > > It is an i7 > Motherboard: ASROCK H97 PRO4 RETAIL > CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX > It has 16GB of RAM, one SSD and one HDD > I have NO external graphics card > > Do you want me to run something on this like lspci, lsusb Yes, please post the output of both. > I upgraded the BIOS of the motherboard to the latest. This is not the > problem though because I upgraded after the problem occurred as a counter > measure in case I was hit by a buggy BIOS and linux had changed its > behavior to be stricter. BIOS was my first guess, but in case the other two reports are really due to the same problem I doubt the BIOS is to blame (one Lenovo and one Sony laptop). > I experimented with ACPI compilers/decompilers and I was tempted to fix my > ACPI tables but I didn't. > > I saw the kernel command line option acpi_os=!Windows2013 but I didn't try > it. Do you thing I should try it? You could try "nopat" as command line option. > >> Wow! Thanks for doing this work! >> > > I would like this to be fixed so I am willing to do the testing. I appreciate this spirit. :-) Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 19/11/15 08:50, vas...@iit.demokritos.gr wrote: > Hi, > > Thanks for the quick answer > >> >> Could you please try the most recent 4.3 kernel? There has been some >> work related to this topic after 4.2 (large page pat handling done by >> Toshi Kani and mtrr/pat handling by Luis Rodriguez). > > That means I will reset the bisection. Right? Is there any other info we > can extract from there? I don't see what else should be specific to that patch other than the information that the issue occurred due to that patch. All further diagnostic information should be obtainable with a newer kernel, too. > So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume > 4.3 for now. I think 4.3 is okay. > I will do it later tonight. It will take 2 days at least to report back Okay, thank you for your effort! > >> >> Another interesting information would be the exact hardware you are >> using. Maybe we can see some similarities between yours and the other >> two cases you referenced above. >> > > It is an i7 > Motherboard: ASROCK H97 PRO4 RETAIL > CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX > It has 16GB of RAM, one SSD and one HDD > I have NO external graphics card > > Do you want me to run something on this like lspci, lsusb Yes, please post the output of both. > I upgraded the BIOS of the motherboard to the latest. This is not the > problem though because I upgraded after the problem occurred as a counter > measure in case I was hit by a buggy BIOS and linux had changed its > behavior to be stricter. BIOS was my first guess, but in case the other two reports are really due to the same problem I doubt the BIOS is to blame (one Lenovo and one Sony laptop). > I experimented with ACPI compilers/decompilers and I was tempted to fix my > ACPI tables but I didn't. > > I saw the kernel command line option acpi_os=!Windows2013 but I didn't try > it. Do you thing I should try it? You could try "nopat" as command line option. > >> Wow! Thanks for doing this work! >> > > I would like this to be fixed so I am willing to do the testing. I appreciate this spirit. :-) Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 11:10 AM, Juergen Gross wrote: So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume 4.3 for now. I think 4.3 is okay. I will do it later tonight. It will take 2 days at least to report back I compiled and I am running 4.3 right now. If it fails I will try with the nopat option. If it fails I will try 3.18-rc2+nopat to see if that fails. Do you want me to run something on this like lspci, lsusb Yes, please post the output of both. Here they are. See attachments I would like this to be fixed so I am willing to do the testing. I appreciate this spirit. :-) I appreciate the guidance. :-) Vassilis Bus 004 Device 002: ID 8087:8001 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8009 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 002: ID 8087:8001 Intel Corp. Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize064 idVendor 0x8087 Intel Corp. idProduct 0x8001 bcdDevice0.00 iManufacturer 0 iProduct0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 12 Hub Descriptor: bLength 11 bDescriptorType 41 nNbrPorts 8 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood0 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable0x00 0x00 PortPwrCtrlMask0xff 0xff Hub Port Status: Port 1: .0100 power Port 2: .0100 power Port 3: .0100 power Port 4: .0100 power Port 5: .0100 power Port 6: .0100 power Port 7: .0100 power Port 8: .0100 power Device Qualifier (for other device speed): bLength10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 bNumConfigurations 1 Device Status: 0x0001 Self Powered Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 idVendor 0x1d6b Linux Foundation idProduct 0x0002 2.0 root hub bcdDevice4.03 iManufacturer 3 Linux 4.3.0+ ehci_hcd iProduct2 EHCI Host Controller iSerial 1 :00:1d.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: I compiled and I am running 4.3 right now. It failed this morning. Last night I did 3 hibernate / resume cycles. In the last one I I also turned off the PSU (this seems to push it over the edge - but it may be random behavior) and it worked. This morning 7h later failed to resume - but it didn't hang on _lapic_resume. This time it rebooted - and I seem to recall this behavior for 4.2+ kernels. I forgot to mention it because my testing with 4.x kernels were one month before. So 4.3 kernel - reboots on resume after a long hibernation time. I am testing with 4.3 and nopat right now. Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
Hi, Thanks for the quick answer > > Could you please try the most recent 4.3 kernel? There has been some > work related to this topic after 4.2 (large page pat handling done by > Toshi Kani and mtrr/pat handling by Luis Rodriguez). That means I will reset the bisection. Right? Is there any other info we can extract from there? So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume 4.3 for now. I will do it later tonight. It will take 2 days at least to report back > > Another interesting information would be the exact hardware you are > using. Maybe we can see some similarities between yours and the other > two cases you referenced above. > It is an i7 Motherboard: ASROCK H97 PRO4 RETAIL CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX It has 16GB of RAM, one SSD and one HDD I have NO external graphics card Do you want me to run something on this like lspci, lsusb I upgraded the BIOS of the motherboard to the latest. This is not the problem though because I upgraded after the problem occurred as a counter measure in case I was hit by a buggy BIOS and linux had changed its behavior to be stricter. I experimented with ACPI compilers/decompilers and I was tempted to fix my ACPI tables but I didn't. I saw the kernel command line option acpi_os=!Windows2013 but I didn't try it. Do you thing I should try it? > Wow! Thanks for doing this work! > I would like this to be fixed so I am willing to do the testing. Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 18/11/15 22:43, Vassilis Virvilis wrote: > Hi, > > I have been hit by a hibernate/resume bug. Other people may have too: > The following links are consistent with my observations > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494 > https://bugs.archlinux.org/task/44807 > > Some observations: > 1) The first few rapid hibernation / resume cycles do not fail. > > 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + > thunderbird/icedove + Konsole) helps to reproduce and lock up during resume > > 3) Long hibernation times (overnight) helps to reproduce and lock up > during resume > > 4) For the bad commits (where the lockup during resume takes place) - > the image loading during resume is significantly faster. It is fast and > then it locks. > > How I hit the problem and what I have done: > > I am running debian unstable > > Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. > I upgraded diligently up to 4.2.6 - The problem persists Could you please try the most recent 4.3 kernel? There has been some work related to this topic after 4.2 (large page pat handling done by Toshi Kani and mtrr/pat handling by Luis Rodriguez). Another interesting information would be the exact hardware you are using. Maybe we can see some similarities between yours and the other two cases you referenced above. > I added no_console_suspend initcall_debug to the kernel command line - > see attached image of the lockup. > > I added the drm.debug=0xe but it didn't produce any interesting (ok I > know who I am to judge?) and the runs did not have it so I took it out > again. > > I reproduced with hibernating and resuming back to KDE and or back to > text console. > > I switched to the VGA console and the resume problem persists. > > I started kernel bisection from 3.16 to 3.19 following > https://wiki.debian.org/DebianKernel/GitBisect > > One month and 25 kernels later see below for the bisect log Wow! Thanks for doing this work! Juergen > > I hit some untestable kernel that weren't booting. They were hanging at > "Loading ramdisk..." before any actual kernel message. > > Looks like the first bad / untestable commit is from Juergen Gross / > Thomas Gleixner Merge branch 'x86-mm-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support] > > Full disclaimer: I may have fucked up the bisection. Finding bad commits > was semi easy - finding good commits needs a run time for 2-3 days. > > I would really appreciate some help and directions to nail this down. > > > Regards > > Vassilis Virvilis > > > > bill@localhost:~/Downloads/linux$ git bisect log > git bisect start > # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16 > git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6 > # bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 > git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735 > # good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch > 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping > git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a > # bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag > 'devicetree-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux > git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34 > # good: [53429290a054b30e4683297409fc4627b2592315] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc > git bisect good 53429290a054b30e4683297409fc4627b2592315 > # good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag > 'drivers-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51 > # bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch > 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs > git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3 > # good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag > 'defconfig-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect good 151cd97630f87451cab412e40750d0e5f7581c98 > # good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch > 'irq-core-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729 > # bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch > 'x86-microcode-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc > # good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch > 'x86-boot-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124 > # bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch > 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad a023748d53c10850650fe86b1c4a7d421d576451 > # good:
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 18/11/15 22:43, Vassilis Virvilis wrote: > Hi, > > I have been hit by a hibernate/resume bug. Other people may have too: > The following links are consistent with my observations > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494 > https://bugs.archlinux.org/task/44807 > > Some observations: > 1) The first few rapid hibernation / resume cycles do not fail. > > 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + > thunderbird/icedove + Konsole) helps to reproduce and lock up during resume > > 3) Long hibernation times (overnight) helps to reproduce and lock up > during resume > > 4) For the bad commits (where the lockup during resume takes place) - > the image loading during resume is significantly faster. It is fast and > then it locks. > > How I hit the problem and what I have done: > > I am running debian unstable > > Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. > I upgraded diligently up to 4.2.6 - The problem persists Could you please try the most recent 4.3 kernel? There has been some work related to this topic after 4.2 (large page pat handling done by Toshi Kani and mtrr/pat handling by Luis Rodriguez). Another interesting information would be the exact hardware you are using. Maybe we can see some similarities between yours and the other two cases you referenced above. > I added no_console_suspend initcall_debug to the kernel command line - > see attached image of the lockup. > > I added the drm.debug=0xe but it didn't produce any interesting (ok I > know who I am to judge?) and the runs did not have it so I took it out > again. > > I reproduced with hibernating and resuming back to KDE and or back to > text console. > > I switched to the VGA console and the resume problem persists. > > I started kernel bisection from 3.16 to 3.19 following > https://wiki.debian.org/DebianKernel/GitBisect > > One month and 25 kernels later see below for the bisect log Wow! Thanks for doing this work! Juergen > > I hit some untestable kernel that weren't booting. They were hanging at > "Loading ramdisk..." before any actual kernel message. > > Looks like the first bad / untestable commit is from Juergen Gross / > Thomas Gleixner Merge branch 'x86-mm-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support] > > Full disclaimer: I may have fucked up the bisection. Finding bad commits > was semi easy - finding good commits needs a run time for 2-3 days. > > I would really appreciate some help and directions to nail this down. > > > Regards > > Vassilis Virvilis > > > > bill@localhost:~/Downloads/linux$ git bisect log > git bisect start > # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16 > git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6 > # bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19 > git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735 > # good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch > 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping > git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a > # bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag > 'devicetree-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux > git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34 > # good: [53429290a054b30e4683297409fc4627b2592315] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc > git bisect good 53429290a054b30e4683297409fc4627b2592315 > # good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag > 'drivers-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51 > # bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch > 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs > git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3 > # good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag > 'defconfig-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect good 151cd97630f87451cab412e40750d0e5f7581c98 > # good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch > 'irq-core-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729 > # bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch > 'x86-microcode-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc > # good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch > 'x86-boot-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124 > # bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch > 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad a023748d53c10850650fe86b1c4a7d421d576451 > # good:
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
Hi, Thanks for the quick answer > > Could you please try the most recent 4.3 kernel? There has been some > work related to this topic after 4.2 (large page pat handling done by > Toshi Kani and mtrr/pat handling by Luis Rodriguez). That means I will reset the bisection. Right? Is there any other info we can extract from there? So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume 4.3 for now. I will do it later tonight. It will take 2 days at least to report back > > Another interesting information would be the exact hardware you are > using. Maybe we can see some similarities between yours and the other > two cases you referenced above. > It is an i7 Motherboard: ASROCK H97 PRO4 RETAIL CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX It has 16GB of RAM, one SSD and one HDD I have NO external graphics card Do you want me to run something on this like lspci, lsusb I upgraded the BIOS of the motherboard to the latest. This is not the problem though because I upgraded after the problem occurred as a counter measure in case I was hit by a buggy BIOS and linux had changed its behavior to be stricter. I experimented with ACPI compilers/decompilers and I was tempted to fix my ACPI tables but I didn't. I saw the kernel command line option acpi_os=!Windows2013 but I didn't try it. Do you thing I should try it? > Wow! Thanks for doing this work! > I would like this to be fixed so I am willing to do the testing. Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/