Re: [qubes-users] Debugging a sleep/suspend problem on Razer Blade Stealth 2016 - Qubes

2020-01-17 Thread Guerlan


On Tuesday, January 14, 2020 at 7:58:11 AM UTC-3, Abel Luck wrote:
>
> Abel Luck: 
> > Hi there, 
> > 
> > I'm debugging similar resume issues, though on different hardware. 
> > Hopefully you don't mind if we share tips in this thread. 
> >   
> > 
> >>> I couldn't find anything related to those acpi devices. I thougth 
> first 
> >> that there was a driver for 
> >>> them, so I should just rmmod those drivers before sleep and insmod 
> when 
> >> wakeup, but couldn't find 
> >>> anything. There's this issue 
> >> https://ubuntuforums.org/archive/index.php/t-2393029.html which have 
> >>> those exact hash matches, but no answer. 
> >> 
> >> I don't know a lot about pm_trace, but it seems like there might be a 
> >> problem decoding the hash. 
> >> Normally it should show you a PCI address, /sys device name, driver 
> name, 
> >> or something more 
> >> specific (see example in link below). 
> >> 
> >> According to s2ram kernel documentation: 
> >> 
> >> If no device matches the hash (or any matches appear to be false 
> >> positives), the culprit may be a 
> >> device from a loadable kernel module that is not loaded until after the 
> >> hash is checked. You can 
> >> check the hash against the current devices again after more modules are 
> >> loaded using sysfs: 
> >> 
> >> cat /sys/power/pm_trace_dev_match 
> >> 
> >> 
> https://www.kernel.org/doc/html/latest/power/s2ram.html#using-trace-resume 
> >> 
> >> However, in qubes we may also have the opposite problem. Qubes takes 
> over 
> >> your network cards and 
> >> sometimes USB controllers in early userspace, so the drivers are not 
> >> available anytime. To disable 
> >> this behavior for USB controllers, remove rd.qubes.hide_all_usb from 
> the 
> >> kernel cmdline. For 
> >> network cards it's a little more complicated. 
> >> 
> >> You can try modifying the qubes initramfs hook. First, make sure there 
> are 
> >> no VMs configured to 
> >> start automatically at boot. Move 
> >> /usr/lib/dracut/modules.d/90qubes-pciback/ to your home 
> >> directory, or open the qubes-pciback.sh file and comment out the last 9 
> or 
> >> so lines (from "for dev 
> >> in $HIDEPCI"). Rebuild the initramfs. Then, do the pm_trace again as 
> you 
> >> did before. Then, try 
> >> pm_trace_dev_match as described in the link above. 
> >> 
> >> It might give you better information about the problem device, or it 
> might 
> >> just give you the same 
> >> info as before, but it's something to try. 
> >> 
> >> If it doesn't work, don't forget to put that file back how it was, and 
> >> rebuild initramfs again. 
> >> 
> > 
> > Thanks for this tip. Using this method I was able to get a "hash 
> matches" 
> > line in my dmesg whereas before I didn't get one. 
> > 
> > I am also debugging a suspend resume issue but with a Asus z390 I Aorus 
> Pro 
> > Wifi motherboard on a desktop (and an nvidia gpu unfortunately). 
> > 
> > Some interesting facts: 
> > 
> > 1) the pci device that matched was "INT34B9:00". I can't really find 
> > much info about what this device is, it doesn't correspond to anything 
> > under lspci. /sys/bus/acpi/devices/INT34B9:00/uid contains the value 
> > "SerialIoUart1" 
> > 
> > 2) suspend and resume works when I execute "echo mem > 
> /sys/power/state". 
> > However when I execute the suspend from xfce or run systemctl suspend, 
> the 
> > resume fails (with a black screen but the keyboard lights up). 
> >   
>
> >> Just some general tips: try kernel-latest, and Qubes R4.1, if you 
> haven't 
> >> yet. 
>
>
> Some interesting news, TL;DR is that I got suspend/resume working! 
> Here's how: 
>
>
> I updated dom0 to kernel-latest, booted again and with all vms off 
> tested suspend with this script: 
>
> ``` 
> #!/bin/sh 
>
> sync 
> echo 1 > /sys/power/pm_trace 
> echo mem > /sys/power/state 
> ``` 
>
> Resume worked. However as soon as I turned on sys-usb it failed to 
> resume again, with the monitor staying off but the keyboard lights 
> turning on. 
>
> At this point I went into my bios and disabled all the devices I could: 
> wlan adapter, ethernet adapter, graphics, etc. 
>
> Throughout this point I was constantly checking for the "hash matches" 
> devices in dmesg and looking at /sys/power/pm_trace_dev_match. Also I 
> had edited qubes-pciback.sh as described by Claudia. There was never a 
> clear smoking gun that revealed some particular device, and the values 
> seemed to change with every reboot or configuration. However at one 
> point I noticed 'drm' in  pm_trace_dev_match, and this would prove 
> useful later. 
>
> My motherboard has integrated intel graphics (igfx) but also a PCIe 
> nvidia card. Eventually I happened upon the bios configuration where I 
> enabled integrated graphics (I had no option to disable the nvidias card 
> aside from physically removing it). 
>
> Booting into Qubes using the igfx output, I noticed 'drm' in the 
> pm_trace_dev_match, which I know has something to do with the nouveau 
> driver.  So I disabled 

Re: [qubes-users] Debugging a sleep/suspend problem on Razer Blade Stealth 2016 - Qubes

2020-01-17 Thread Guerlan


On Saturday, January 11, 2020 at 4:37:17 PM UTC-3, Claudia wrote:
>
> January 9, 2020 1:55 AM, "Guerlan" > 
> wrote:
>
> > First of all, here's the HCL for my Razer Blade Stealth 2016 4K 
> touchscreen 16gb RAM 512gb SSD:
> > 
> https://groups.google.com/forum/#!searchin/qubes-users/razer$20blade|sort:date/qubes-users/PalZ-1inx
> > A/D3mQ4OI3CAAJ
> > 
> > When I close the lid and open again, keyboard wont ligth up, screen wont 
> turn on (it's LED so I can
> > see a brigth black when it turns on), and hitting keyboard or touchpad 
> does nothing. I have to
> > reboot. I don't know, however, if keyboard not ligthing when I open the 
> lid is because sys-usb,
> > which contains the keyboard, is not waken. Every other aspect of the 
> laptop seems to be working
> > perfectly.
>
> When you're testing, make sure there are no VMs set to start on boot, 
> especially not sys-net and
> sys-usb, and make sure rd.qubes.hide_all_usb is not set. You can try to 
> get that stuff working
> later on.
>
> Does pressing caps lock or num lock turn on/off their lights on the 
> keyboard? Does ctrl-alt-delete,
> or Alt-SysRq-B (you have to enable it first) cause it to reboot? If you 
> suspend with sound playing,
> can you hear it when you try to resume?
>
> > I followed Ubuntu's guide on kernel suspend bugs: 
> https://wiki.ubuntu.com/DebuggingKernelSuspend
> > 
> > Then, following what they suggest
> > 
> > `sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"`
> > 
> > and find the lines that says hash matches in dmesg rigth after reboot 
> (what does that mean?)
> > 
> > Well, I found two:
> > 
> > ```
> > [ 3.583591] ima: Allocated hash algorithm: sha1
> > [ 3.593050] input: AT Raw Set 2 keyboard as 
> /devices/platform/i8042/serio0/input/input4
> > [ 3.638808] Magic number: 0:929:176
> > [ 3.638867] acpi device:39: hash matches
> > [ 3.638893] acpi device:0c: hash matches
> > [ 3.639073] rtc_cmos 00:01: setting system clock to 2016-01-01 12:09:51 
> UTC (1451650191)
> > ```
> > 
> > I couldn't find anything related to those acpi devices. I thougth first 
> that there was a driver for
> > them, so I should just rmmod those drivers before sleep and insmod when 
> wakeup, but couldn't find
> > anything. There's this issue 
> https://ubuntuforums.org/archive/index.php/t-2393029.html which have
> > those exact hash matches, but no answer.
>
> I don't know a lot about pm_trace, but it seems like there might be a 
> problem decoding the hash.
> Normally it should show you a PCI address, /sys device name, driver name, 
> or something more
> specific (see example in link below).
>
> According to s2ram kernel documentation:
>
> If no device matches the hash (or any matches appear to be false 
> positives), the culprit may be a
> device from a loadable kernel module that is not loaded until after the 
> hash is checked. You can
> check the hash against the current devices again after more modules are 
> loaded using sysfs:
>
> cat /sys/power/pm_trace_dev_match
>
> https://www.kernel.org/doc/html/latest/power/s2ram.html#using-trace-resume
>
> However, in qubes we may also have the opposite problem. Qubes takes over 
> your network cards and
> sometimes USB controllers in early userspace, so the drivers are not 
> available anytime. To disable
> this behavior for USB controllers, remove rd.qubes.hide_all_usb from the 
> kernel cmdline. For
> network cards it's a little more complicated.
>
> You can try modifying the qubes initramfs hook. First, make sure there are 
> no VMs configured to
> start automatically at boot. Move 
> /usr/lib/dracut/modules.d/90qubes-pciback/ to your home
> directory, or open the qubes-pciback.sh file and comment out the last 9 or 
> so lines (from "for dev
> in $HIDEPCI"). Rebuild the initramfs. Then, do the pm_trace again as you 
> did before. Then, try
> pm_trace_dev_match as described in the link above.
>
> It might give you better information about the problem device, or it might 
> just give you the same
> info as before, but it's something to try.
>
> If it doesn't work, don't forget to put that file back how it was, and 
> rebuild initramfs again.
>
> > Then I asked for help on a forum and they found this problematic line on 
> my dmesg:
> > 
> > `[ 2.543596] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM`
> > 
> > seems like ASPM is disabled on my Qubes. I don't know why. Should this 
> be considered a bug? Is
> > there anything I can do to get it working? This looks promising.
> > 
> > It's worth noting that on Ubuntu 18, 19, Fedora 30, Linux Mint, etc, all 
> these systems work like a
> > charm with the sleep process. I can close the lid and open and it works. 
> So the problem seems to be
> > **related to Qubes**. I even tried qubes most recent dom0 kernel, based 
> on 5.x linux kernel, but
> > the problem persists.
>
> There's a pretty big difference between Fedora and Qubes. R4.0 is based on 
> Fedora 25, not 30. Also
> have you tried suspend on any of those OSes with Xen 

Re: [qubes-users] Debugging a sleep/suspend problem on Razer Blade Stealth 2016 - Qubes

2020-01-14 Thread Abel Luck
Abel Luck:
> Hi there,
> 
> I'm debugging similar resume issues, though on different hardware. 
> Hopefully you don't mind if we share tips in this thread.
>  
> 
>>> I couldn't find anything related to those acpi devices. I thougth first 
>> that there was a driver for
>>> them, so I should just rmmod those drivers before sleep and insmod when 
>> wakeup, but couldn't find
>>> anything. There's this issue 
>> https://ubuntuforums.org/archive/index.php/t-2393029.html which have
>>> those exact hash matches, but no answer.
>>
>> I don't know a lot about pm_trace, but it seems like there might be a 
>> problem decoding the hash.
>> Normally it should show you a PCI address, /sys device name, driver name, 
>> or something more
>> specific (see example in link below).
>>
>> According to s2ram kernel documentation:
>>
>> If no device matches the hash (or any matches appear to be false 
>> positives), the culprit may be a
>> device from a loadable kernel module that is not loaded until after the 
>> hash is checked. You can
>> check the hash against the current devices again after more modules are 
>> loaded using sysfs:
>>
>> cat /sys/power/pm_trace_dev_match
>>
>> https://www.kernel.org/doc/html/latest/power/s2ram.html#using-trace-resume
>>
>> However, in qubes we may also have the opposite problem. Qubes takes over 
>> your network cards and
>> sometimes USB controllers in early userspace, so the drivers are not 
>> available anytime. To disable
>> this behavior for USB controllers, remove rd.qubes.hide_all_usb from the 
>> kernel cmdline. For
>> network cards it's a little more complicated.
>>
>> You can try modifying the qubes initramfs hook. First, make sure there are 
>> no VMs configured to
>> start automatically at boot. Move 
>> /usr/lib/dracut/modules.d/90qubes-pciback/ to your home
>> directory, or open the qubes-pciback.sh file and comment out the last 9 or 
>> so lines (from "for dev
>> in $HIDEPCI"). Rebuild the initramfs. Then, do the pm_trace again as you 
>> did before. Then, try
>> pm_trace_dev_match as described in the link above.
>>
>> It might give you better information about the problem device, or it might 
>> just give you the same
>> info as before, but it's something to try.
>>
>> If it doesn't work, don't forget to put that file back how it was, and 
>> rebuild initramfs again.
>>
> 
> Thanks for this tip. Using this method I was able to get a "hash matches" 
> line in my dmesg whereas before I didn't get one.
> 
> I am also debugging a suspend resume issue but with a Asus z390 I Aorus Pro 
> Wifi motherboard on a desktop (and an nvidia gpu unfortunately).
> 
> Some interesting facts:
> 
> 1) the pci device that matched was "INT34B9:00". I can't really find 
> much info about what this device is, it doesn't correspond to anything 
> under lspci. /sys/bus/acpi/devices/INT34B9:00/uid contains the value 
> "SerialIoUart1"
> 
> 2) suspend and resume works when I execute "echo mem > /sys/power/state". 
> However when I execute the suspend from xfce or run systemctl suspend, the 
> resume fails (with a black screen but the keyboard lights up).
>  

>> Just some general tips: try kernel-latest, and Qubes R4.1, if you haven't 
>> yet.


Some interesting news, TL;DR is that I got suspend/resume working!
Here's how:


I updated dom0 to kernel-latest, booted again and with all vms off
tested suspend with this script:

```
#!/bin/sh

sync
echo 1 > /sys/power/pm_trace
echo mem > /sys/power/state
```

Resume worked. However as soon as I turned on sys-usb it failed to
resume again, with the monitor staying off but the keyboard lights
turning on.

At this point I went into my bios and disabled all the devices I could:
wlan adapter, ethernet adapter, graphics, etc.

Throughout this point I was constantly checking for the "hash matches"
devices in dmesg and looking at /sys/power/pm_trace_dev_match. Also I
had edited qubes-pciback.sh as described by Claudia. There was never a
clear smoking gun that revealed some particular device, and the values
seemed to change with every reboot or configuration. However at one
point I noticed 'drm' in  pm_trace_dev_match, and this would prove
useful later.

My motherboard has integrated intel graphics (igfx) but also a PCIe
nvidia card. Eventually I happened upon the bios configuration where I
enabled integrated graphics (I had no option to disable the nvidias card
aside from physically removing it).

Booting into Qubes using the igfx output, I noticed 'drm' in the
pm_trace_dev_match, which I know has something to do with the nouveau
driver.  So I disabled as described at
https://www.qubes-os.org/doc/nvidia-troubleshooting/#disabling-nouveau.

Then resume worked!

I could have left it there and relied on igfx alone, but I hadn't had
any problems with nouveau, and for various reasons want to use it rather
than igfx. So on a hunch I tried the opposite process. I disabled igfx
in the bios and then added iommu=no-igfx to the GRUB_CMDLINE_LINUX (not
the XEN line) and 

Re: [qubes-users] Debugging a sleep/suspend problem on Razer Blade Stealth 2016 - Qubes

2020-01-13 Thread Abel Luck
Hi there,

I'm debugging similar resume issues, though on different hardware. 
Hopefully you don't mind if we share tips in this thread.
 

> > I couldn't find anything related to those acpi devices. I thougth first 
> that there was a driver for
> > them, so I should just rmmod those drivers before sleep and insmod when 
> wakeup, but couldn't find
> > anything. There's this issue 
> https://ubuntuforums.org/archive/index.php/t-2393029.html which have
> > those exact hash matches, but no answer.
>
> I don't know a lot about pm_trace, but it seems like there might be a 
> problem decoding the hash.
> Normally it should show you a PCI address, /sys device name, driver name, 
> or something more
> specific (see example in link below).
>
> According to s2ram kernel documentation:
>
> If no device matches the hash (or any matches appear to be false 
> positives), the culprit may be a
> device from a loadable kernel module that is not loaded until after the 
> hash is checked. You can
> check the hash against the current devices again after more modules are 
> loaded using sysfs:
>
> cat /sys/power/pm_trace_dev_match
>
> https://www.kernel.org/doc/html/latest/power/s2ram.html#using-trace-resume
>
> However, in qubes we may also have the opposite problem. Qubes takes over 
> your network cards and
> sometimes USB controllers in early userspace, so the drivers are not 
> available anytime. To disable
> this behavior for USB controllers, remove rd.qubes.hide_all_usb from the 
> kernel cmdline. For
> network cards it's a little more complicated.
>
> You can try modifying the qubes initramfs hook. First, make sure there are 
> no VMs configured to
> start automatically at boot. Move 
> /usr/lib/dracut/modules.d/90qubes-pciback/ to your home
> directory, or open the qubes-pciback.sh file and comment out the last 9 or 
> so lines (from "for dev
> in $HIDEPCI"). Rebuild the initramfs. Then, do the pm_trace again as you 
> did before. Then, try
> pm_trace_dev_match as described in the link above.
>
> It might give you better information about the problem device, or it might 
> just give you the same
> info as before, but it's something to try.
>
> If it doesn't work, don't forget to put that file back how it was, and 
> rebuild initramfs again.
>

Thanks for this tip. Using this method I was able to get a "hash matches" 
line in my dmesg whereas before I didn't get one.

I am also debugging a suspend resume issue but with a Asus z390 I Aorus Pro 
Wifi motherboard on a desktop (and an nvidia gpu unfortunately).

Some interesting facts:

1) the pci device that matched was "INT34B9:00". I can't really find 
much info about what this device is, it doesn't correspond to anything 
under lspci. /sys/bus/acpi/devices/INT34B9:00/uid contains the value 
"SerialIoUart1"

2) suspend and resume works when I execute "echo mem > /sys/power/state". 
However when I execute the suspend from xfce or run systemctl suspend, the 
resume fails (with a black screen but the keyboard lights up).
 

> > I also tried `pcie_aspm=force` on `/boot/efi/EFI/qubes/xen.cfg` (is this 
> where I put kernel
> > parameters?) like this:
>
> Yes on R4.0 you use xen.cfg. On other releases, you use /etc/default/grub. 
> Unfortunately I don't
> know anything about ASPM so you probably know more than I do.
>

I also don't know much about ASPM, but I noticed my bios had a section for 
"Active State Power Management" which was disabled, I enabled it (and the 
sub-options that appeared) but still haven't had luck.


> If anyone has other debug ideas, I'm very thankful!
>
> Just some general tips: try kernel-latest, and Qubes R4.1, if you haven't 
> yet.
>

I'm still on 4.0, how does one try 4.1 without a full re-install?
 

> Also make sure your
> firmware is up to date. If your machine has a dGPU, disable it in BIOS.
>
> It doesn't sound like the CPUID Xen panic I had on my machine, but you 
> could try the Xen patch
> anyway, if nothing else works. In my case, only the fan came back on, but 
> not the screen backlight
> or anything else.
>
> I also had to pin dom0 to CPU 0 to fix a different problem (my SATA 
> controller was broken after
> resume). Add the following to your Xen cmdline ("options=", not 
> "kernel="!): "dom0_max_vcpus=1
> dom0_vcpus_pin"
>
>
Will give these a try.

I have both iwlifi and nouveau, which are definitely top suspects however 
they haven't given me any issues and so far no evidence points to them 
being responsible.

~abel

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/47fbbab7-4fff-41fd-b5ea-21ca1ede668b%40googlegroups.com.


Re: [qubes-users] Debugging a sleep/suspend problem on Razer Blade Stealth 2016 - Qubes

2020-01-11 Thread Claudia
January 9, 2020 1:55 AM, "Guerlan"  wrote:

> First of all, here's the HCL for my Razer Blade Stealth 2016 4K touchscreen 
> 16gb RAM 512gb SSD:
> https://groups.google.com/forum/#!searchin/qubes-users/razer$20blade|sort:date/qubes-users/PalZ-1inx
> A/D3mQ4OI3CAAJ
> 
> When I close the lid and open again, keyboard wont ligth up, screen wont turn 
> on (it's LED so I can
> see a brigth black when it turns on), and hitting keyboard or touchpad does 
> nothing. I have to
> reboot. I don't know, however, if keyboard not ligthing when I open the lid 
> is because sys-usb,
> which contains the keyboard, is not waken. Every other aspect of the laptop 
> seems to be working
> perfectly.

When you're testing, make sure there are no VMs set to start on boot, 
especially not sys-net and
sys-usb, and make sure rd.qubes.hide_all_usb is not set. You can try to get 
that stuff working
later on.

Does pressing caps lock or num lock turn on/off their lights on the keyboard? 
Does ctrl-alt-delete,
or Alt-SysRq-B (you have to enable it first) cause it to reboot? If you suspend 
with sound playing,
can you hear it when you try to resume?

> I followed Ubuntu's guide on kernel suspend bugs: 
> https://wiki.ubuntu.com/DebuggingKernelSuspend
> 
> Then, following what they suggest
> 
> `sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"`
> 
> and find the lines that says hash matches in dmesg rigth after reboot (what 
> does that mean?)
> 
> Well, I found two:
> 
> ```
> [ 3.583591] ima: Allocated hash algorithm: sha1
> [ 3.593050] input: AT Raw Set 2 keyboard as 
> /devices/platform/i8042/serio0/input/input4
> [ 3.638808] Magic number: 0:929:176
> [ 3.638867] acpi device:39: hash matches
> [ 3.638893] acpi device:0c: hash matches
> [ 3.639073] rtc_cmos 00:01: setting system clock to 2016-01-01 12:09:51 UTC 
> (1451650191)
> ```
> 
> I couldn't find anything related to those acpi devices. I thougth first that 
> there was a driver for
> them, so I should just rmmod those drivers before sleep and insmod when 
> wakeup, but couldn't find
> anything. There's this issue 
> https://ubuntuforums.org/archive/index.php/t-2393029.html which have
> those exact hash matches, but no answer.

I don't know a lot about pm_trace, but it seems like there might be a problem 
decoding the hash.
Normally it should show you a PCI address, /sys device name, driver name, or 
something more
specific (see example in link below).

According to s2ram kernel documentation:

If no device matches the hash (or any matches appear to be false positives), 
the culprit may be a
device from a loadable kernel module that is not loaded until after the hash is 
checked. You can
check the hash against the current devices again after more modules are loaded 
using sysfs:

cat /sys/power/pm_trace_dev_match

https://www.kernel.org/doc/html/latest/power/s2ram.html#using-trace-resume

However, in qubes we may also have the opposite problem. Qubes takes over your 
network cards and
sometimes USB controllers in early userspace, so the drivers are not available 
anytime. To disable
this behavior for USB controllers, remove rd.qubes.hide_all_usb from the kernel 
cmdline. For
network cards it's a little more complicated.

You can try modifying the qubes initramfs hook. First, make sure there are no 
VMs configured to
start automatically at boot. Move /usr/lib/dracut/modules.d/90qubes-pciback/ to 
your home
directory, or open the qubes-pciback.sh file and comment out the last 9 or so 
lines (from "for dev
in $HIDEPCI"). Rebuild the initramfs. Then, do the pm_trace again as you did 
before. Then, try
pm_trace_dev_match as described in the link above.

It might give you better information about the problem device, or it might just 
give you the same
info as before, but it's something to try.

If it doesn't work, don't forget to put that file back how it was, and rebuild 
initramfs again.

> Then I asked for help on a forum and they found this problematic line on my 
> dmesg:
> 
> `[ 2.543596] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM`
> 
> seems like ASPM is disabled on my Qubes. I don't know why. Should this be 
> considered a bug? Is
> there anything I can do to get it working? This looks promising.
> 
> It's worth noting that on Ubuntu 18, 19, Fedora 30, Linux Mint, etc, all 
> these systems work like a
> charm with the sleep process. I can close the lid and open and it works. So 
> the problem seems to be
> **related to Qubes**. I even tried qubes most recent dom0 kernel, based on 
> 5.x linux kernel, but
> the problem persists.

There's a pretty big difference between Fedora and Qubes. R4.0 is based on 
Fedora 25, not 30. Also
have you tried suspend on any of those OSes with Xen installed and running? Or, 
have you tried
booting Qubes without Xen? (Here's how to boot Qubes 4.0 without Xen:
https://www.mail-archive.com/qubes-users@googlegroups.com/msg31138.html - 
however it may be easier
for you to install Qubes 4.1 on a removable drive to test 

[qubes-users] Debugging a sleep/suspend problem on Razer Blade Stealth 2016 - Qubes

2020-01-08 Thread Guerlan
First of all, here's the HCL for my Razer Blade Stealth 2016 4K touchscreen 
16gb RAM 512gb SSD: 
https://groups.google.com/forum/#!searchin/qubes-users/razer$20blade%7Csort:date/qubes-users/PalZ-1inxnA/D3mQ4OI3CAAJ

When I close the lid and open again, keyboard wont ligth up, screen wont 
turn on (it's LED so I can see a brigth black when it turns on), and 
hitting keyboard or touchpad does nothing. I have to reboot. I don't know, 
however, if keyboard not ligthing when I open the lid is because sys-usb, 
which contains the keyboard, is not waken. Every other aspect of the laptop 
seems to be working perfectly.

I followed Ubuntu's guide on kernel suspend bugs: 
https://wiki.ubuntu.com/DebuggingKernelSuspend

Then, following what they suggest

`sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"`

and find the lines that says hash matches in dmesg rigth after reboot (what 
does that mean?)

Well, I found two:

```
[3.583591] ima: Allocated hash algorithm: sha1
[3.593050] input: AT Raw Set 2 keyboard as 
/devices/platform/i8042/serio0/input/input4
[3.638808]   Magic number: 0:929:176
[3.638867] acpi device:39: hash matches
[3.638893] acpi device:0c: hash matches
[3.639073] rtc_cmos 00:01: setting system clock to 2016-01-01 12:09:51 
UTC (1451650191)
```

I couldn't find anything related to those acpi devices. I thougth first 
that there was a driver for them, so I should just rmmod those drivers 
before sleep and insmod when wakeup, but couldn't find anything. There's 
this issue https://ubuntuforums.org/archive/index.php/t-2393029.html which 
have those exact hash matches, but no answer. 

Then I asked for help on a forum and they found this problematic line on my 
dmesg:

`[2.543596] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM`

seems like ASPM is disabled on my Qubes. I don't know why. Should this be 
considered a bug? Is there anything I can do to get it working? *This looks 
promising.*

It's worth noting that on Ubuntu 18, 19, Fedora 30, Linux Mint, etc, *all 
these systems work like a charm with the sleep process*. I can close the 
lid and open and it works. So the problem seems to be **related to Qubes**. 
I even tried qubes most recent dom0 kernel, based on 5.x linux kernel, but 
the problem persists.

I also tried `pcie_aspm=force` on `/boot/efi/EFI/qubes/xen.cfg` (is this 
where I put kernel parameters?) like this:

`kernel=vmlinuz-4.14.74-1.pvops.qubes.x86_64 
root=/dev/mapper/qubes_dom0-root 
rd.luks.uuid=luks-39fc83eb-9829-43b7-86e8-08068bd81087 
rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap i915.alpha_support=1 
pcie_aspm=force rhgb quiet plymouth.ignore-serial-consoles`

but it didn't help.

I pratically *need* to run Qubes on this notebook because any Linux 
distribution with any kernel will have a problem that corrupts my SSD many 
times a day. No one could solve it, and on Qubes it never happens. I tried 
Qubes just to see if it'd solve and it does! I'm loving it, not going back 
even on other notebooks. However, closing the lid/putting the system to 
sleep is essential for a notebook.

```
[lz@dom0 ~]$ cat /sys/power/mem_sleep 
s2idle [deep]
```

as you see, the suspend default is deep mode.

I tried s2idle by doing `echo freeze > /sys/power/state` and the screen 
turns off but they keyboard keeps with lights on. Pressing buttons does 
nothing. Pressing touchpad, nothing. Pressing power rapidly, nothing. Had 
to reboot by long pressing power. I thougth s2idle should always work since 
it's software based. 

Here's my journalctl of the moment when I go to suspend by closing the lid 
(that is, suspending in deep mode):

```
Jan 07 20:56:24 dom0 systemd-logind[1925]: Lid closed.
Jan 07 20:56:24 dom0 systemd-logind[1925]: Suspending...
Jan 07 20:56:24 dom0 systemd[1]: Starting Qubes suspend hooks...
Jan 07 20:56:25 dom0 qmemman.daemon.algo[1921]: balance_when_enough_memory(
xen_free_memory=8172072647, total_mem_pref=2493652659.2, 
total_available_memory=13171544083.8)
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '5' 
act=3198156800 pref=963591782.4 last_target=3198156800
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '0' 
act=4294967296 pref=1530060876.8 last_target=4294967296
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: xenfree=8224501447 
memset_reqs=[('5', 3198156800), ('0', 4294967296)]
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: mem-set domain 5 to 
3198156800
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: mem-set domain 0 to 
4294967296
Jan 07 20:56:25 dom0 qrexec[3884]: qubes.GetDate: social -> @default: 
allowed to dom0
Jan 07 20:56:25 dom0 qmemman.daemon.algo[1921]: 
balance_when_enough_memory(xen_free_memory=8172072647, 
total_mem_pref=2450575027.2, total_available_memory=13214621715.8)
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '5' 
act=3198156800 pref=920514150.4 last_target=3198156800
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '0' 
act=4294967296