Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread Masanobu SAITOH

Hi, David.

On 2018/11/28 6:09, David Brownlee wrote:

On Tue, 27 Nov 2018 at 18:10, David Brownlee  wrote:


On Tue, 27 Nov 2018 at 08:27, Masanobu SAITOH  wrote:


   Hi, David.

On 2018/11/26 6:11, David Brownlee wrote:

I've bisected the changes against the github src copy, and it looks like the 
suspend/resume issue is related to the following commit:

commit 0fe469276f49bf0dc003300e0b8a35a80b7b246d (HEAD)
Author: jdolecek 
Date:   Mon Oct 22 20:57:07 2018 +

  enable MSI support where available, blatantly copied from jmcneill's 
msk(4)

I tried building from HEAD with just that one commit reverted, and my T420s 
suspends and resumes again!

iwn0 is still non responsive after resume and wm0 will not pick up an IP via 
dhcpcd, but the disk responds :-p


   (Note that I'm not familiar with suspend/resume though...)

   Our pci_suspend()/pci_resume() copy only first 16 bytes of each PCI
config space. Other OSes copy some other control registers and
MSI/MSI-X capability area.

   Could you dump all PCI config space both before and after suspend with:

 http://www.netbsd.org/~msaitoh/pcidump

and put the two output somewhere? Diffing the two output will teach
us what we have to do.

   Thanks in advance.


Let me just install to a USB stick to give me a working filesystem
from which to run pcidump after resume :-p


Collecting a pre-suspend dump was easy, but getting post-resume turned
out to be a little more involved :)
- root on wd0 on ahcisata - times out on resume
- root on sd0 on usb on xhci - times out on resume
- root on sd0 on usb on uhci - loses the root filesystem mount point on resume
- install image - doesn't have the libs to run pcictl
- install image, then chroot to mfs with extracted base - suspends but
video does not come back (no drm)
- root on wd0, then chroot to mfs with extracted base, suspend &
resume, then mount sd0 on usb on uhci to save data - \o/

After all that it occurred to me I could have probably run the
suspend/resume with an older NetBSD version where MSI was not being
used. Still, interesting puzzle to try, and useful technique to stash.

Files for the ThinkPad T420s:

http://ftp.netbsd.org/pub/NetBSD/misc/abs/acpi-suspend-resume/pcidump.pre
http://ftp.netbsd.org/pub/NetBSD/misc/abs/acpi-suspend-resume/pcidump.post


The diff says we should save/restore MSI table.
We also should save/restore some other registers.

 Give me one or two days to resolve the problem.


 Thanks.



Thanks for looking at this!

David




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Automated report: NetBSD-current/i386 build success

2018-11-27 Thread NetBSD Test Fixture
The NetBSD-current/i386 build is working again.

The following commits were made between the last failed build and the
successful build:

2018.11.28.00.44.08 kre src/usr.sbin/sysinst/label.c,v 1.6

Log files can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2018.11.html#2018.11.28.00.44.08


daily CVS update output

2018-11-27 Thread NetBSD source update


Updating src tree:
P src/etc/etc.amd64/MAKEDEV.conf
P src/etc/etc.evbarm/MAKEDEV.conf
P src/etc/etc.i386/MAKEDEV.conf
P src/lib/libc/hash/sha1/Makefile.inc
P src/lib/libc/hash/sha1/sha1.3
P src/share/man/man8/man8.x86/boot.8
P src/share/man/man9/pci_msi.9
P src/sys/arch/aarch64/aarch64/netbsd32_machdep.c
P src/sys/arch/alpha/alpha/machdep.c
P src/sys/arch/amd64/amd64/netbsd32_machdep.c
P src/sys/arch/arm/acpi/acpi_platform.c
P src/sys/arch/arm/arm/sig_machdep.c
P src/sys/arch/arm/pci/pci_msi_machdep.c
P src/sys/arch/hppa/hppa/sig_machdep.c
P src/sys/arch/i386/i386/machdep.c
P src/sys/arch/m68k/m68k/sig_machdep.c
P src/sys/arch/mips/mips/netbsd32_machdep.c
P src/sys/arch/mips/mips/sig_machdep.c
P src/sys/arch/powerpc/powerpc/sig_machdep.c
P src/sys/arch/riscv/riscv/sig_machdep.c
P src/sys/arch/sh3/sh3/sh3_machdep.c
P src/sys/arch/sparc64/sparc64/machdep.c
P src/sys/arch/sparc64/sparc64/netbsd32_machdep.c
P src/sys/arch/usermode/target/i386/cpu_i386.c
P src/sys/arch/usermode/target/x86_64/cpu_x86_64.c
P src/sys/arch/vax/vax/sig_machdep.c
P src/sys/arch/x86/pci/pci_intr_machdep.c
P src/sys/dev/pci/if_bge.c
P src/sys/dev/pci/if_bgevar.h
P src/sys/modules/Makefile
P src/tests/bin/sh/t_expand.sh
P src/tests/bin/sh/t_redir.sh
P src/usr.sbin/sysinst/defs.h
P src/usr.sbin/sysinst/disks.c
P src/usr.sbin/sysinst/label.c
P src/usr.sbin/sysinst/mbr.c
P src/usr.sbin/sysinst/partman.c
P src/usr.sbin/sysinst/arch/i386/md.c

Updating xsrc tree:


Killing core files:



Updating release-7 src tree (netbsd-7):

Updating release-7 xsrc tree (netbsd-7):



Updating release-8 src tree (netbsd-8):
U doc/CHANGES-8.1
P sys/dev/dksubr.c
P sys/dev/pci/if_bge.c
P sys/dev/pci/if_wm.c
P sys/kern/subr_evcnt.c

Updating release-8 xsrc tree (netbsd-8):




Updating file list:
-rw-rw-r--  1 srcmastr  netbsd  52363438 Nov 28 03:08 ls-lRA.gz


Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread David H. Gutteridge
On Sat, 2018-11-24 at 22:47 +, David Brownlee wrote:
> On Sat, 24 Nov 2018 at 18:52, David H. Gutteridge  > wrote:
> > On Fri, 2018-11-23 at 21:42 +, David Brownlee wrote:
> > > netbsd-8 Single user:
> > > - Suspend (hw.acpi.sleep.state=3) and resume appears to work
> > > reliably
> > > many times in a row
> > > - Booting multi user after suspend/resume: wireless iwn0 does not
> > > appear to work "iwn0: could not load firmware .text section"
> > 
> > I see that too. I haven't looked into it yet, but wondered if it was
> > as simple as forcing it to reload its firmware after resumption.
> 
> Mmm, the man page indicates "iwn0: could not load firmware .text
> section" is reported when it attempted to
> load the firmware from disk into the device but failed, so it may be a
> little more than that :/

That error definitely can mean just that, but it's notable that it's
not a case of the firmware file being absent or unloadable, as for me
it successfully loads on boot, it only gives that error on wakeup.

Dave




Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread David H. Gutteridge
On Tue, 2018-11-27 at 18:08 +, David Brownlee wrote:
> On Sun, 25 Nov 2018 at 21:11, David Brownlee  wrote:
> > I've bisected the changes against the github src copy, and it looks like 
> > the suspend/resume issue is related to the following commit:
> > 
> > commit 0fe469276f49bf0dc003300e0b8a35a80b7b246d (HEAD)
> > Author: jdolecek 
> > Date:   Mon Oct 22 20:57:07 2018 +
> > 
> > enable MSI support where available, blatantly copied from jmcneill's 
> > msk(4)
> > 
> > I tried building from HEAD with just that one commit reverted, and my T420s 
> > suspends and resumes again!
> > 
> > iwn0 is still non responsive after resume and wm0 will not pick up an IP 
> > via dhcpcd, but the disk responds :-p
> 
> So it turns out I'm as affective at off-by-one errors in git
> bisect as I am in coding... :/
> 
> It turns out the commit with the issue was:
> 
> commit 1628082c6b882d064bd5d77e5847c42b44b59fde (HEAD, refs/bisect/bad)
> Author: jdolecek 
> Date:   Mon Oct 22 21:04:53 2018 +
> 
> enable MSI support where available
> 
> M   sys/dev/pci/ahcisata_pci.c
> 
> Apologies...

No worries, I don't have an siisata device on that laptop, so I
figured it was ahcisata I needed to revert. I've done so, and tested,
and, yes, backing that change set out gets my laptop resuming without
disk errors again.

Thanks for the work you've put into isolating this and getting the PCI
config dumps!

Dave





Re: Automated report: NetBSD-current/i386 build failure

2018-11-27 Thread Andreas Gustafsson
The NetBSD Test Fixture wrote:
>  2/../../../../.. -DLIBSA_SINGL--- dependall-usr.sbin ---
> *** [label.o] Error code 1

A hopefully more relevant error message earlier in the log:

  --- dependall-sysinst ---
  
/tmp/bracket/build/2018.11.27.20.08.05-i386/src/usr.sbin/sysinst/arch/i386/../../label.c:240:23:
 error: comparison of unsigned expression < 0 is always false 
[-Werror=type-limits]
 else if (p->pi_size < (128 * GIG / 512))


-- 
Andreas Gustafsson, g...@gson.org


Automated report: NetBSD-current/i386 build failure

2018-11-27 Thread NetBSD Test Fixture
This is an automatically generated notice of a NetBSD-current/i386
build failure.

The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
using sources from CVS date 2018.11.27.20.08.05.

An extract from the build.sh output follows:


/tmp/bracket/build/2018.11.27.20.08.05-i386/tools/bin/i486--netbsdelf-objcopy 
-x  strtoi.o
--- strtou.o ---
#   compile  kern/strtou.o
/tmp/bracket/build/2018.11.27.20.08.05-i386/tools/bin/i486--netbsdelf-gcc 
-O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -Wno-main   -Os 
-ffreestanding -fomit-frame-pointer -fno-unwind-tables  
-fno-asynchronous-unwind-tables -fno-exceptions -mno-sse   -std=gnu99   -Werror 
 -march=i386 -mtune=i386   
--sysroot=/tmp/bracket/build/2018.11.27.20.08.05-i386/destdir 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/arch/i386
 -DSLOW -DCOMPAT_386BSD_MBRPART -DXMS -DLIBSA_ENABLE_LS_OP 
-DSTACK_START=0x1 
--sysroot=/tmp/bracket/build/2018.11.27.20.08.05-i386/destdir -nostdinc 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/obj/sys/arch/i386/stand/dosboot 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../..
 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../arch/i386/stand/lib
 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../..
 /../lib/libsa -D_STANDALONE 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/lib/libc/quad
 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/lib/libc/string
 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/lib/libc/arch/i386/string
  
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/lib/libc/quad
 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/lib/libc/string
 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/lib/libc/arch/i386/string
 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/include
  -c/tmp/bracket/bu
 
ild/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/dosboot/../../../../lib/libkern/../../../common/lib/libc/stdlib/strtou.c
 -o strtou.o
--- dependall-tests ---
#create  rc/Atffile
--- dependall-sys ---
--- dependall-bootxx ---
#   compile  sa/ext2fs.o
--- dependall-usr.sbin ---
--- dependall-sysinst ---
cc1: all warnings being treated as errors
--- dependall-sys ---
--- dependall-dosboot ---
/tmp/bracket/build/2018.11.27.20.08.05-i386/tools/bin/nbctfconvert -g -L 
VERSION strtou.o
--- dependall-bootxx ---
--- dependall-external ---
/tmp/bracket/build/2018.11.27.20.08.05-i386/tools/bin/nbctfconvert -g -L 
VERSION filemode.o
--- dependall-sys ---
/tmp/bracket/build/2018.11.27.20.08.05-i386/tools/bin/i486--netbsdelf-gcc 
-Wall -Wmissing-prototypes -Wstrict-prototypes   -Os -ffreestanding 
-fomit-frame-pointer -fno-unwind-tables  -fno-asynchronous-unwind-tables 
-fno-exceptions -mno-sse   -std=gnu99   -Werror  -march=i386 -mtune=i386  
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/bootxx/bootxx_ffsv2/../../../../../lib/libsa
 --sysroot=/tmp/bracket/build/2018.11.27.20.08.05-i386/destdir -DBOOTXX -I 
/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/bootxx/bootxx_ffsv2/../../lib
 -I 
/tmp/bracket/build/2018.11.27.20.08.05-i386/obj/sys/arch/i386/stand/bootxx/bootxx_ffsv2
 -DBOOTXX_SECTORS=15 -DPRIMARY_LOAD_ADDRESS=0x1000 
-DSECONDARY_LOAD_ADDRESS=0x1 -DXXfs_open=ffsv2_open 
-DXXfs_close=ffsv2_close -DXXfs_read=ffsv2_read -DXXfs_stat=ffsv2_stat 
-DFS=ffsv2 -DNO_LBA_CHECK -DEPIA_HACK -nostdinc -D_STANDALONE 
-I/tmp/bracket/build/2018.11.27.20.08.05-i386/src/sys/arch/i386/stand/bootxx/bootxx_ffsv
 2/../../../../.. -DLIBSA_SINGL--- dependall-usr.sbin ---
*** [label.o] Error code 1
--- dependall-sys ---
--- dependall-dosboot ---

The following commits were made between the last successful build and
the failed build:

2018.11.27.14.09.53 maxv src/sys/arch/aarch64/aarch64/netbsd32_machdep.c,v 
1.3
2018.11.27.14.09.53 maxv src/sys/arch/alpha/alpha/machdep.c,v 1.352
2018.11.27.14.09.53 maxv src/sys/arch/amd64/amd64/netbsd32_machdep.c,v 1.117
2018.11.27.14.09.53 maxv src/sys/arch/arm/arm/sig_machdep.c,v 1.51
2018.11.27.14.09.53 maxv src/sys/arch/hppa/hppa/sig_machdep.c,v 1.26
2018.11.27.14.09.53 maxv src/sys/arch/i386/i386/machdep.c,v 1.813
2018.11.27.14.09.54 maxv src/sys/arch/m68k/m68k/sig_machdep.c,v 1.50
2018.11.27.14.09.54 maxv src/sys/arch/mips/mips/netbsd32_machdep.c,v 1.16

Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread David Brownlee
On Tue, 27 Nov 2018 at 18:10, David Brownlee  wrote:
>
> On Tue, 27 Nov 2018 at 08:27, Masanobu SAITOH  wrote:
> >
> >   Hi, David.
> >
> > On 2018/11/26 6:11, David Brownlee wrote:
> > > I've bisected the changes against the github src copy, and it looks like 
> > > the suspend/resume issue is related to the following commit:
> > >
> > > commit 0fe469276f49bf0dc003300e0b8a35a80b7b246d (HEAD)
> > > Author: jdolecek 
> > > Date:   Mon Oct 22 20:57:07 2018 +
> > >
> > >  enable MSI support where available, blatantly copied from jmcneill's 
> > > msk(4)
> > >
> > > I tried building from HEAD with just that one commit reverted, and my 
> > > T420s suspends and resumes again!
> > >
> > > iwn0 is still non responsive after resume and wm0 will not pick up an IP 
> > > via dhcpcd, but the disk responds :-p
> >
> >   (Note that I'm not familiar with suspend/resume though...)
> >
> >   Our pci_suspend()/pci_resume() copy only first 16 bytes of each PCI
> > config space. Other OSes copy some other control registers and
> > MSI/MSI-X capability area.
> >
> >   Could you dump all PCI config space both before and after suspend with:
> >
> > http://www.netbsd.org/~msaitoh/pcidump
> >
> > and put the two output somewhere? Diffing the two output will teach
> > us what we have to do.
> >
> >   Thanks in advance.
>
> Let me just install to a USB stick to give me a working filesystem
> from which to run pcidump after resume :-p

Collecting a pre-suspend dump was easy, but getting post-resume turned
out to be a little more involved :)
- root on wd0 on ahcisata - times out on resume
- root on sd0 on usb on xhci - times out on resume
- root on sd0 on usb on uhci - loses the root filesystem mount point on resume
- install image - doesn't have the libs to run pcictl
- install image, then chroot to mfs with extracted base - suspends but
video does not come back (no drm)
- root on wd0, then chroot to mfs with extracted base, suspend &
resume, then mount sd0 on usb on uhci to save data - \o/

After all that it occurred to me I could have probably run the
suspend/resume with an older NetBSD version where MSI was not being
used. Still, interesting puzzle to try, and useful technique to stash.

Files for the ThinkPad T420s:

http://ftp.netbsd.org/pub/NetBSD/misc/abs/acpi-suspend-resume/pcidump.pre
http://ftp.netbsd.org/pub/NetBSD/misc/abs/acpi-suspend-resume/pcidump.post

Thanks for looking at this!

David


Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread David Brownlee
On Tue, 27 Nov 2018 at 08:27, Masanobu SAITOH  wrote:
>
>   Hi, David.
>
> On 2018/11/26 6:11, David Brownlee wrote:
> > I've bisected the changes against the github src copy, and it looks like 
> > the suspend/resume issue is related to the following commit:
> >
> > commit 0fe469276f49bf0dc003300e0b8a35a80b7b246d (HEAD)
> > Author: jdolecek 
> > Date:   Mon Oct 22 20:57:07 2018 +
> >
> >  enable MSI support where available, blatantly copied from jmcneill's 
> > msk(4)
> >
> > I tried building from HEAD with just that one commit reverted, and my T420s 
> > suspends and resumes again!
> >
> > iwn0 is still non responsive after resume and wm0 will not pick up an IP 
> > via dhcpcd, but the disk responds :-p
>
>   (Note that I'm not familiar with suspend/resume though...)
>
>   Our pci_suspend()/pci_resume() copy only first 16 bytes of each PCI
> config space. Other OSes copy some other control registers and
> MSI/MSI-X capability area.
>
>   Could you dump all PCI config space both before and after suspend with:
>
> http://www.netbsd.org/~msaitoh/pcidump
>
> and put the two output somewhere? Diffing the two output will teach
> us what we have to do.
>
>   Thanks in advance.

Let me just install to a USB stick to give me a working filesystem
from which to run pcidump after resume :-p

David


Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread David Brownlee
On Sun, 25 Nov 2018 at 21:11, David Brownlee  wrote:
>
> I've bisected the changes against the github src copy, and it looks like the 
> suspend/resume issue is related to the following commit:
>
> commit 0fe469276f49bf0dc003300e0b8a35a80b7b246d (HEAD)
> Author: jdolecek 
> Date:   Mon Oct 22 20:57:07 2018 +
>
> enable MSI support where available, blatantly copied from jmcneill's 
> msk(4)
>
> I tried building from HEAD with just that one commit reverted, and my T420s 
> suspends and resumes again!
>
> iwn0 is still non responsive after resume and wm0 will not pick up an IP via 
> dhcpcd, but the disk responds :-p

So it turns out I'm as affective at off-by-one errors in git
bisect as I am in coding... :/

It turns out the commit with the issue was:

commit 1628082c6b882d064bd5d77e5847c42b44b59fde (HEAD, refs/bisect/bad)
Author: jdolecek 
Date:   Mon Oct 22 21:04:53 2018 +

enable MSI support where available

M   sys/dev/pci/ahcisata_pci.c

Apologies...

David


Automated report: NetBSD-current/i386 test failure

2018-11-27 Thread NetBSD Test Fixture
This is an automatically generated notice of new failures of the
NetBSD test suite.

The newly failing test cases are:

usr.bin/c++/t_asan_poison:poison_pie
usr.bin/cc/t_asan_poison:poison_pie

The above tests failed in each of the last 3 test runs, and passed in
at least 27 consecutive runs before that.

The following commits were made between the last successful test and
the failed test:

2018.11.26.17.18.01 skrll src/sys/kern/kern_lwp.c,v 1.195
2018.11.26.17.37.44 joerg src/lib/csu/arch/aarch64/crt0.S,v 1.2
2018.11.26.17.37.44 joerg src/lib/csu/arch/alpha/crt0.S,v 1.2
2018.11.26.17.37.44 joerg src/lib/csu/arch/arm/crt0.S,v 1.4
2018.11.26.17.37.44 joerg src/lib/csu/arch/earm/crt0.S,v 1.4
2018.11.26.17.37.45 joerg src/lib/csu/arch/hppa/crt0.S,v 1.2
2018.11.26.17.37.45 joerg src/lib/csu/arch/i386/crt0.S,v 1.4
2018.11.26.17.37.45 joerg src/lib/csu/arch/ia64/crt0.S,v 1.2
2018.11.26.17.37.45 joerg src/lib/csu/arch/m68k/crt0.S,v 1.5
2018.11.26.17.37.45 joerg src/lib/csu/arch/mips/crt0.S,v 1.4
2018.11.26.17.37.45 joerg src/lib/csu/arch/or1k/crt0.S,v 1.2
2018.11.26.17.37.45 joerg src/lib/csu/arch/powerpc/crt0.S,v 1.7
2018.11.26.17.37.45 joerg src/lib/csu/arch/riscv/crt0.S,v 1.2
2018.11.26.17.37.45 joerg src/lib/csu/arch/sh3/crt0.S,v 1.7
2018.11.26.17.37.45 joerg src/lib/csu/arch/sparc/crt0.S,v 1.3
2018.11.26.17.37.46 joerg src/lib/csu/arch/sparc64/crt0.S,v 1.2
2018.11.26.17.37.46 joerg src/lib/csu/arch/vax/crt0.S,v 1.4
2018.11.26.17.37.46 joerg src/lib/csu/arch/x86_64/crt0.S,v 1.4
2018.11.26.17.37.46 joerg src/lib/csu/common/Makefile.inc,v 1.33
2018.11.26.17.37.46 joerg src/lib/csu/common/crt0-common.c,v 1.20
2018.11.26.17.40.26 joerg src/libexec/ld.elf_so/rtld.h,v 1.135
2018.11.26.18.08.41 ryo src/usr.sbin/cpuctl/arch/aarch64.c,v 1.4

Log files can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2018.11.html#2018.11.26.18.08.41


Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread Masanobu SAITOH

On 2018/11/27 17:27, Masanobu SAITOH wrote:

  Hi, David.

On 2018/11/26 6:11, David Brownlee wrote:

I've bisected the changes against the github src copy, and it looks like the 
suspend/resume issue is related to the following commit:

commit 0fe469276f49bf0dc003300e0b8a35a80b7b246d (HEAD)
Author: jdolecek 
Date:   Mon Oct 22 20:57:07 2018 +

     enable MSI support where available, blatantly copied from jmcneill's msk(4)

I tried building from HEAD with just that one commit reverted, and my T420s 
suspends and resumes again!

iwn0 is still non responsive after resume and wm0 will not pick up an IP via 
dhcpcd, but the disk responds :-p


  (Note that I'm not familiar with suspend/resume though...)

  Our pci_suspend()/pci_resume() copy only first 16 bytes of each PCI


s/16 bytes/64 bytes/


config space. Other OSes copy some other control registers and
MSI/MSI-X capability area.

  Could you dump all PCI config space both before and after suspend with:

 http://www.netbsd.org/~msaitoh/pcidump

and put the two output somewhere? Diffing the two output will teach
us what we have to do.

  Thanks in advance.



David

On Sat, 24 Nov 2018 at 22:47, David Brownlee mailto:a...@absd.org>> wrote:

    On Sat, 24 Nov 2018 at 18:52, David H. Gutteridge mailto:da...@gutteridge.ca>> wrote:
 >
 > On Fri, 2018-11-23 at 21:42 +, David Brownlee wrote:
 > > Another couple of data points in case it helps
 > >
 > > Tested on Thinkpad T420s and T530 with NetBSD/amd64 - both have
 > > similar behaviour
 > >
 > > 8.99.25 Single user:
 > > - Suspends and seems to resume but hangs on first disk access "wd0a:
 > > device timeout reading fsbn ..."
 >
 > Yes, I get that too. pgoyette@ suggested I follow up with jdolecek@
 > about it, but I haven't had time yet to look for more details. There
 > are a number of PRs that jdolecek@ was working on fixing that
 > reference "clearing WDCTL_RST failed for drive" in the dmesg. In my
 > case, I get that error on both 8.0_STABLE and 8.99.26 (after his
 > latest changes), but it seems like it's a red herring or there's more
 > to it, because 8 still resumes reliably regardless of that warning,
 > while HEAD behaves as you've seen. I just keep getting continuous
 > output with "wd0a: device timeout writing fsbn X of X..."

    I asked jdolecek@ if it might be worth bisecting to find out when the
    hang was introduced, and he replied it was.
    I've just started using the github copy of src. Mon Oct 22 2018 was "good"

 > > netbsd-8 Single user:
 > > - Suspend (hw.acpi.sleep.state=3) and resume appears to work reliably
 > > many times in a row
 > > - Booting multi user after suspend/resume: wireless iwn0 does not
 > > appear to work "iwn0: could not load firmware .text section"
 >
 > I see that too. I haven't looked into it yet, but wondered if it was
 > as simple as forcing it to reload its firmware after resumption.

    Mmm, the man page indicates "iwn0: could not load firmware .text
    section" is reported when it attempted to
    load the firmware from disk into the device but failed, so it may be a
    little more than that :/

 > (Actually, my iwn didn't work at all, originally, because it requires
 > a different firmware file than any that are distributed by NetBSD at
 > present, and needed an addition in the driver to target that firmware.
 > I made those changes in my tree and have been testing with them on
 > both 8 and HEAD.)
 >
 > > netbsd-8 Multi user no x11:
 > > - Suspends, keyboard *usually* non responsive on resume (but can
 > > switch virtual terminals)
 >
 > I've never had this problem, I've found my T420 consistently responsive
 > whether I'm at a console or have suspended with X running (typically
 > with an Xfce4 session). When it comes back, no issues there (aside from
 > iwn).

    Thats definitely encouraging!

    David







--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ThinkPad - suspend-to-RAM intel-x86 issues and tests

2018-11-27 Thread Masanobu SAITOH

 Hi, David.

On 2018/11/26 6:11, David Brownlee wrote:

I've bisected the changes against the github src copy, and it looks like the 
suspend/resume issue is related to the following commit:

commit 0fe469276f49bf0dc003300e0b8a35a80b7b246d (HEAD)
Author: jdolecek 
Date:   Mon Oct 22 20:57:07 2018 +

     enable MSI support where available, blatantly copied from jmcneill's msk(4)

I tried building from HEAD with just that one commit reverted, and my T420s 
suspends and resumes again!

iwn0 is still non responsive after resume and wm0 will not pick up an IP via 
dhcpcd, but the disk responds :-p


 (Note that I'm not familiar with suspend/resume though...)

 Our pci_suspend()/pci_resume() copy only first 16 bytes of each PCI
config space. Other OSes copy some other control registers and
MSI/MSI-X capability area.

 Could you dump all PCI config space both before and after suspend with:

http://www.netbsd.org/~msaitoh/pcidump

and put the two output somewhere? Diffing the two output will teach
us what we have to do.

 Thanks in advance.



David

On Sat, 24 Nov 2018 at 22:47, David Brownlee mailto:a...@absd.org>> wrote:

On Sat, 24 Nov 2018 at 18:52, David H. Gutteridge mailto:da...@gutteridge.ca>> wrote:
 >
 > On Fri, 2018-11-23 at 21:42 +, David Brownlee wrote:
 > > Another couple of data points in case it helps
 > >
 > > Tested on Thinkpad T420s and T530 with NetBSD/amd64 - both have
 > > similar behaviour
 > >
 > > 8.99.25 Single user:
 > > - Suspends and seems to resume but hangs on first disk access "wd0a:
 > > device timeout reading fsbn ..."
 >
 > Yes, I get that too. pgoyette@ suggested I follow up with jdolecek@
 > about it, but I haven't had time yet to look for more details. There
 > are a number of PRs that jdolecek@ was working on fixing that
 > reference "clearing WDCTL_RST failed for drive" in the dmesg. In my
 > case, I get that error on both 8.0_STABLE and 8.99.26 (after his
 > latest changes), but it seems like it's a red herring or there's more
 > to it, because 8 still resumes reliably regardless of that warning,
 > while HEAD behaves as you've seen. I just keep getting continuous
 > output with "wd0a: device timeout writing fsbn X of X..."

I asked jdolecek@ if it might be worth bisecting to find out when the
hang was introduced, and he replied it was.
I've just started using the github copy of src. Mon Oct 22 2018 was "good"

 > > netbsd-8 Single user:
 > > - Suspend (hw.acpi.sleep.state=3) and resume appears to work reliably
 > > many times in a row
 > > - Booting multi user after suspend/resume: wireless iwn0 does not
 > > appear to work "iwn0: could not load firmware .text section"
 >
 > I see that too. I haven't looked into it yet, but wondered if it was
 > as simple as forcing it to reload its firmware after resumption.

Mmm, the man page indicates "iwn0: could not load firmware .text
section" is reported when it attempted to
load the firmware from disk into the device but failed, so it may be a
little more than that :/

 > (Actually, my iwn didn't work at all, originally, because it requires
 > a different firmware file than any that are distributed by NetBSD at
 > present, and needed an addition in the driver to target that firmware.
 > I made those changes in my tree and have been testing with them on
 > both 8 and HEAD.)
 >
 > > netbsd-8 Multi user no x11:
 > > - Suspends, keyboard *usually* non responsive on resume (but can
 > > switch virtual terminals)
 >
 > I've never had this problem, I've found my T420 consistently responsive
 > whether I'm at a console or have suspended with X running (typically
 > with an Xfce4 session). When it comes back, no issues there (aside from
 > iwn).

Thats definitely encouraging!

David




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)