Re: clean on-disk filesystems through {suspend,hibernate}/resume

2018-01-07 Thread Theo de Raadt
> > BTW, if anyone uses softdep *you have to tell me*, and then try
> > to repeat problems you encounter without softdep.  That is a
> > totally different problem set.
> 
> Yes, I am using softdep.

I am not concerned with the softdep case.  softdep needs a maintainer,
and it isn't me.  I'll provide hints for how to debug this though:

First apply the following diff to the tree.  This will keep the screen
alive during the suspend cycle.  On some inteldrm chipsets it will
fail to resume afterwards, however.  On x230 this works, newer models
cannot handle this hack.  Anyways the goal is is to observe why it
isn't suceeding at completing the suspend sync.

Having the screen alive makes it possible to add printf's to the ffs
softdep code, in particular softdep_sync_metadata() and such
functions.  Figure out what the code is doing keeping so busy.  Why
does it keep doing IO?  Is it writing data blocks for files?  Is it
repeatedly updating the same metadata?

For this suspend case, the sync functions are being called with various
_WAIT flags instead of _NOWAIT or _LAZY.  It is being asked to achieve
stability.  What stops it from achieving stability?  When you read the
code in the area you'll be shocked at the comments.  Try to figure
out which cases are occurring.

Anyone with rudimentary C skills and patience can do this.  (But I
won't be doing it, I have other things to do)

Index: i915_drv.c
===
RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v
retrieving revision 1.108
diff -u -p -u -r1.108 i915_drv.c
--- i915_drv.c  30 Sep 2017 07:36:56 -  1.108
+++ i915_drv.c  21 Dec 2017 05:52:54 -
@@ -673,6 +673,8 @@ static int i915_drm_suspend(struct drm_d
pci_power_t opregion_target_state;
int error;
 
+   return 0;
+
/* ignore lid events during suspend */
mutex_lock(_priv->modeset_restore_lock);
dev_priv->modeset_restore = MODESET_SUSPENDED;
@@ -745,6 +747,8 @@ static int i915_drm_suspend_late(struct 
 {
struct drm_i915_private *dev_priv = drm_dev->dev_private;
int ret;
+
+   return 0;
 
ret = intel_suspend_complete(dev_priv);
 



Re: clean on-disk filesystems through {suspend,hibernate}/resume

2018-01-07 Thread Matthias Schmidt
Hi,

* Theo de Raadt wrote:
> 
> BTW, if anyone uses softdep *you have to tell me*, and then try
> to repeat problems you encounter without softdep.  That is a
> totally different problem set.

Yes, I am using softdep.

For testing, I removed softdep and performed all tests again and
run the "extract src.tar while suspending" multiple times both on /tmp
and /home.  Now, the suspend process was quite fast and the file systems
were marked clean every time.

Cheers

Matthias



Re: clean on-disk filesystems through {suspend,hibernate}/resume

2018-01-06 Thread Theo de Raadt
> 4.  Now the interesting case.  Basically the same as (2) but now I
> extracted src.tar.gz not in /tmp but in /home which is my largest
> partition.  This time, the suspend process does not finished and I
> pulled the plug after some time.

I've heard a report or two of it not completing sync.  I don't know
yet what causes this situation.

BTW, if anyone uses softdep *you have to tell me*, and then try
to repeat problems you encounter without softdep.  That is a
totally different problem set.



Re: clean on-disk filesystems through {suspend,hibernate}/resume

2018-01-06 Thread Matthias Schmidt
Hi,

* Theo de Raadt wrote:
> 
> I would appreciate reports, and later I'll cut this into pieces and
> commit incremental changes.

I run four tests on an Intel NUC with softraid CRYPTO and a keydisk.
Although the sync+suspend does not finish in one test it is definitely
am improvement and your work is highly appreciated!

Cheers

Matthias

1. zzz after manual fsync and pulled the plug after suspend.

Works as expected and no softraid errors.

Jan  6 20:59:21 tau /bsd: /var force clean (0 0): fmod 1 clean 1
Jan  6 20:59:21 tau /bsd: /usr/src force clean (0 0): fmod 1 clean 1
Jan  6 20:59:21 tau /bsd: /usr/ports force clean (0 0): fmod 1 clean 1
Jan  6 20:59:21 tau /bsd: /usr/obj force clean (0 0): fmod 1 clean 1
Jan  6 20:59:21 tau /bsd: /usr/local force clean (0 0): fmod 1 clean 1
Jan  6 20:59:21 tau /bsd: /usr/X11R6 force clean (0 0): fmod 1 clean 1
Jan  6 20:59:21 tau /bsd: /usr force clean (0 0): fmod 1 clean 1
Jan  6 20:59:22 tau /bsd: /tmp force clean (0 0): fmod 1 clean 1
Jan  6 20:59:22 tau /bsd: /home force clean (0 0): fmod 1 clean 1
Jan  6 20:59:22 tau /bsd: / force clean (0 0): fmod 1 clean 1

2. zzz while extracting src.tar.gz on /tmp and pulled the plug after
suspend.

Same result, works as expected.

3, ZZZ after manual fsync.

Same result, works as expected.

4.  Now the interesting case.  Basically the same as (2) but now I
extracted src.tar.gz not in /tmp but in /home which is my largest
partition.  This time, the suspend process does not finished and I
pulled the plug after some time.  My /home partition was extremely dirty
but the others were marked as clean.  So definitely an improvement over
the current situation.

/dev/sd2l: SIZE=762 MTIME=Sep 14 16:04 2016  (RECONNECTED)
/dev/sd2l (1cae2f5f79b7f28f.l): UNREF FILE  I=11553631  OWNER=xhr MODE=100644
[ Hundreds of unferenced files ]
/dev/sd2l (1cae2f5f79b7f28f.l): FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/sd2l (1cae2f5f79b7f28f.l): SUMMARY INFORMATION BAD (SALVAGED)
/dev/sd2l (1cae2f5f79b7f28f.l): BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/sd2l (1cae2f5f79b7f28f.l): 176550 files, 13148621 used, 91136441 free 
(88761 frags, 11380960 blocks, 0.1% fragmentation)
/dev/sd2l (1cae2f5f79b7f28f.l): MARKING FILE SYSTEM CLEAN
/dev/sd2d (1cae2f5f79b7f28f.d): file system is clean; not checking
/dev/sd2f (1cae2f5f79b7f28f.f): file system is clean; not checking
/dev/sd2g (1cae2f5f79b7f28f.g): file system is clean; not checking
/dev/sd2h (1cae2f5f79b7f28f.h): file system is clean; not checking
/dev/sd2k (1cae2f5f79b7f28f.k): file system is clean; not checking
/dev/sd2j (1cae2f5f79b7f28f.j): file system is clean; not checking
/dev/sd2i (1cae2f5f79b7f28f.i): file system is clean; not checking
/dev/sd2e (1cae2f5f79b7f28f.e): file system is clean; not checking

Jan  6 21:09:25 tau /bsd: /var force clean (0 0): fmod 1 clean 1
Jan  6 21:09:25 tau /bsd: /usr/src force clean (0 0): fmod 1 clean 1
Jan  6 21:09:25 tau /bsd: /usr/ports force clean (0 0): fmod 1 clean 1
Jan  6 21:09:25 tau /bsd: /usr/obj force clean (0 0): fmod 1 clean 1
Jan  6 21:09:25 tau /bsd: /usr/local force clean (0 0): fmod 1 clean 1
Jan  6 21:09:25 tau /bsd: /usr/X11R6 force clean (0 0): fmod 1 clean 1
Jan  6 21:09:25 tau /bsd: /usr force clean (0 0): fmod 1 clean 1
Jan  6 21:09:25 tau /bsd: /tmp force clean (0 0): fmod 1 clean 1

Both / and /home are missing here and they were both marked as dirty.

Here my disklabel as reference:

# /dev/rsd2c:
type: SCSI
disk: SCSI disk
label: SR CRYPTO
duid: 1cae2f5f79b7f28f
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 58368
total sectors: 937697393
boundstart: 64
boundend: 937681920
drivedata: 0

16 partitions:
#size   offset  fstype [fsize bsize   cpg]
  a:  2097152   64  4.2BSD   2048 16384 12958 # /
  b: 33820888  2097216swap# none
  c:9376973930  unused
  d:  4194304 35918112  4.2BSD   2048 16384 12958 # /tmp
  e:  4194304 40112416  4.2BSD   2048 16384 12958 # /var
  f:  4194304 44306720  4.2BSD   2048 16384 12958 # /usr
  g:  2097152 48501024  4.2BSD   2048 16384 12958 # /usr/X11R6
  h: 20971520 50598176  4.2BSD   2048 16384 12958 # /usr/local
  i:  8388608 71569696  4.2BSD   2048 16384 12958 # /usr/src
  j:  8399168 79958304  4.2BSD   2048 16384 12958 # /usr/ports
  k:  8385952 88357472  4.2BSD   2048 16384 12958 # /usr/obj
  l:840938496 96743424  4.2BSD   4096 32768 26062 # /home

OpenBSD 6.2-current (GENERIC.MP) #0: Sat Jan  6 20:02:16 CET 2018
x...@tau.xosc.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17047859200 (16258MB)
avail mem = 16524271616 (15758MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b1d5000 (58 entries)
bios0: vendor Intel Corp. 

clean on-disk filesystems through {suspend,hibernate}/resume

2018-01-06 Thread Theo de Raadt
I've been working for about a month to ensure filesystems are
maximally syncronized and/or clean on-disk through a suspend/resume
cycle.

The idea is if a suspend/resume or hibernate/resume sequence gets
broken (by pulling the power+battery during suspend, or similar
circumstances during the hiberate-write sequence), we can be assured
that the filesystems are in the best shape.  And if done correctly,
we'll even have marked-clean filesystems which don't need a fsck, so
that fresh boot is faster.

There is also a similar case when softraid (layers) underly the
filesystems.  These layers need proper syncronization to disk also.

Previously we've been ignoring this issue, and frankly we've done
mostly fine...

The changes starts with a series of changes to suspend.  It is a bit
tricky to syncronize the in-memory soft-state of the fileystems to
disk, and block new in-memory changes from happening.

New allocations of vnodes are caused to sleep-spin, so that other
processes cannot advance creating new files.  All mountpoints are told
to non-lazy sync their filesystems and locks are held on these
mountpoints so that no new activity can occur.  During this phase, the
number of dangling inodes (nlink == 0) is counted, and if any are
found the on-disk filesystem is marked dirty, otherwise marked clean.
Next, softraid can be told to save it's state, but it uses vnodes so a
hack allows it to bypass the sleep-spin mentioned earlier.  Once the
suspend code knows there are no more tsleep, it can unwind the mount
locks so there is less to worry about upon resume.

I would appreciate reports, and later I'll cut this into pieces and
commit incremental changes.

Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.335
diff -u -p -u -r1.335 acpi.c
--- dev/acpi/acpi.c 29 Nov 2017 22:51:01 -  1.335
+++ dev/acpi/acpi.c 5 Jan 2018 17:29:37 -
@@ -30,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #ifdef HIBERNATE
 #include 
@@ -61,6 +63,7 @@
 
 #include "wd.h"
 #include "wsdisplay.h"
+#include "softraid.h"
 
 #ifdef ACPI_DEBUG
 intacpi_debug = 16;
@@ -2438,11 +2441,15 @@ int
 acpi_sleep_state(struct acpi_softc *sc, int sleepmode)
 {
extern int perflevel;
+   extern int vnode_sleep;
extern int lid_action;
int error = ENXIO;
size_t rndbuflen = 0;
char *rndbuf = NULL;
int state, s;
+#if NSOFTRAID > 0
+   extern void sr_quiesce(void);
+#endif
 
switch (sleepmode) {
case ACPI_SLEEP_SUSPEND:
@@ -2481,8 +2488,12 @@ acpi_sleep_state(struct acpi_softc *sc, 
 
 #ifdef HIBERNATE
if (sleepmode == ACPI_SLEEP_HIBERNATE) {
-   uvmpd_hibernate();
+   /*
+* Discard useless memory, then attempt to
+* create a hibernate work area
+*/
hibernate_suspend_bufcache();
+   uvmpd_hibernate();
if (hibernate_alloc()) {
printf("%s: failed to allocate hibernate memory\n",
sc->sc_dev.dv_xname);
@@ -2495,18 +2506,38 @@ acpi_sleep_state(struct acpi_softc *sc, 
if (config_suspend_all(DVACT_QUIESCE))
goto fail_quiesce;
 
-   bufq_quiesce();
-
 #ifdef MULTIPROCESSOR
acpi_sleep_mp();
 #endif
 
+   vnode_sleep = 1;
+   vfs_stall(curproc, 1);
+#if NSOFTRAID > 0
+   sr_quiesce();
+#endif
+   bufq_quiesce();
+
+#ifdef HIBERNATE
+   if (sleepmode == ACPI_SLEEP_HIBERNATE) {
+   /*
+* VFS syncing churned lots of memory; so discard
+* useless memory again, hoping no processes are
+* still allocating..
+*/
+   hibernate_suspend_bufcache();
+   uvmpd_hibernate();
+   }
+#endif /* HIBERNATE */
+
resettodr();
 
s = splhigh();
disable_intr(); /* PSL_I for resume; PIC/APIC broken until repair */
cold = 2;   /* Force other code to delay() instead of tsleep() */
 
+   vfs_stall(curproc, 0);
+   vnode_sleep = 0;
+
if (config_suspend_all(DVACT_SUSPEND) != 0)
goto fail_suspend;
acpi_sleep_clocks(sc, state);
@@ -2568,6 +2599,7 @@ fail_suspend:
 #endif
 
bufq_restart();
+   wakeup(_sleep);
 
 fail_quiesce:
config_suspend_all(DVACT_WAKEUP);
@@ -2588,6 +2620,8 @@ fail_alloc:
wsdisplay_resume();
rw_enter_write(>sc_lck);
 #endif /* NWSDISPLAY > 0 */
+
+   sys_sync(curproc, NULL, NULL);
 
/* Restore hw.setperf */
if (cpu_setperf != NULL)
Index: dev/softraid.c
===
RCS file: /cvs/src/sys/dev/softraid.c,v
retrieving revision 1.389
diff -u -p -u -r1.389 softraid.c
--- dev/softraid.c  21 Dec 2017 07:29:15 -  1.389
+++ dev/softraid.c  6 Jan 2018