RE: [Bug 227404] UP FreeBSD VM always hangs on reboot since 20180329-r331740

2018-04-10 Thread Dexuan Cui via freebsd-bugs
> From: Bruce Evans 
> Sent: Tuesday, April 10, 2018 13:09
> > Here the bug is that UP FreeBSD VM hangs on reboot or power-off, and
> > I'm sure this recent patch (which was committed by Jeff on Mar 26) caused
> > this bug:
> > r331561:Fix a bug introduced in r329612 that slowly invalidates all clean 
> > bufs.
> >
> > However, SMP VM with 2 or more CPUs doesn't hang on reboot/power-off
> > according to our tests.
> 
> Actually, r329612 is what causes this bug.  I already did the bisection
> to find almost this bug a couple of weeks ago.  The hang occurs on amd64
> with 4 CPUs but not on amd64 with 8 CPUs or i386 with 4 or 8 CPUS.  I
> just checked that it occurs on i386 with 1 CPU.  All on the same machine.
> But r329611 doesn't hang for any of these cases.

So, it looks to me that: r329612 introduced a hang issue, so Jeff made r331561,
trying to fix the issue, but it looks the issue is not completely fixed (at 
least for me).

I didn't test r329612.

We noticed our amd64 VM (which has a single CPU) hung . The VM kernel was 
built with yesterday's latest kernel code + the default GENERIC kernel config.

However, using the same kernel binary, if we configure 2 or more CPUs
to the VM, the VM doesn't hang on reboot.

If I use the latest code but manually remove the changes made by r331561, 
the hang issue with our single-CPU VM will go away.

I hope the info is helpful.
 
> I still think there is an older bug, but now think it is related.  I
> only tested with SCHED_4BSD.  For SCHED_4BSD, I suspect that the bug
> is from pinning a thread to a CPU and then stopping that CPU.  Pure
> UP has no problems since pinning is null for it.  SCHED_4BSD has especially
> special handing for SMP (a separate runq for each CPU.  I have been
> modifying
> SCHED_4BSD and the separate queues mostly get in the way).
> 
> Bruce

I always use the default GENERIC kernel options, so I guess I'm using 
SCHED_4BSD(?)..

Thanks,
-- Dexuan
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


RE: [Bug 227404] UP FreeBSD VM always hangs on reboot since 20180329-r331740

2018-04-10 Thread Bruce Evans

On Tue, 10 Apr 2018, Dexuan Cui wrote:


From: Bruce Evans
Sent: Tuesday, April 10, 2018 00:45

On Tue, 10 Apr 2018 a bug that doesn't want repl...@freebsd.org wrote:

(The bug didn't even Cc freebsd-bugs for this followup.)


Thanks for the reminder! I Cc'd bugs@ just now.


--- Comment #4 from Dexuan Cui ---

...

I think I saw this a few months before that.

My only history of this is that I built a UP kernel on 17 Dec 2017 to see
if UP kernels had the bug.  So SMP kernels probably had the bug then.

Bruce


Here the bug is that UP FreeBSD VM hangs on reboot or power-off, and
I'm sure this recent patch (which was commited by Jeff on Mar 26) caused this 
bug:
https://github.com/freebsd/freebsd/commit/63a483ed5f4eaadb8979992c7a5de24c7a471c61
("Fix a bug introduced in r329612 that slowly invalidates all clean bufs").

However, SMP VM with 2 or more CPUs doesn't hang on reboot/power-off
according to our tests.


Actually, r329612 is what causes this bug.  I already did the bisection
to find almost this bug a couple of weeks ago.  The hang occurs on amd64
with 4 CPUs but not on amd64 with 8 CPUs or i386 with 4 or 8 CPUS.  I
just checked that it occurs on i386 with 1 CPU.  All on the same machine.
But r329611 doesn't hang for any of these cases.

XX From b...@optusnet.com.au Fri Mar 23 20:06:40 2018 +1100
XX Date: Fri, 23 Mar 2018 20:06:39 +1100 (EST)
XX From: Bruce Evans 
XX X-X-Sender: b...@besplex.bde.org
XX To: j...@freebsd.org
XX Subject: r329612 breaks sync for shutdown
XX Message-ID: <20180323192409.f1...@besplex.bde.org>
XX MIME-Version: 1.0
XX Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
XX Status: O
XX X-Status: 
XX X-Keywords: 
XX X-UID: 7935
XX 
XX r329612 (with or without later changes) sometimes (consistently in one

XX hardware coniguration) hangs in clean shutdowns by init:
XX 
XX i386 with 8 or 4 CPUs: it hasn't failed yet

XX amd64 with 8 CPUs:: it hasn't failed yet
XX and64 with 4 CPUs (by turning off HTT): it always or almost always hangs
XX 
XX The hang is usually in "Syncing disks, vnodes remaining... 0 ".  Less than

XX 10% of the time it hangs earlier in "Waiting ... for  syncer' ...".
XX It is just waiting for a wakeup that never arrives:  In the working case,
XX it prints about 2 more 0's, with about half a second between each.  If it
XX reaches the second 0, it always completes.
XX 
XX This is with SCHED_4BSD.  SCHED_ULE seems to work.  I recently looked for

XX missing wakeups and found some for idle threads in mwait.  This affects
XX both schedulers but fixing it makes little difference.  The bug is invariant
XX under other large changes in options and code.
XX 
XX XX KDB: enter: Break to debugger

XX XX [ thread pid 9 tid 100069 ]
XX XX Stopped at  kdb_enter+0x3a: movq$0,0x700c97(%rip)
XX XX db> ps
XX XX   pid  ppid  pgrp   uid  state   wmesg   wchan   cmd
XX XX18 0 0 0  DL  -   0x80be8462  [schedcpu]
XX XX17 0 0 0  DL  kpsusp  0xf80003e1a6e0  [vnlru]
XX XX16 0 0 0  RL  [syncer]
XX XX 9 0 0 0  RL  (threaded)  [bufdaemon]
XX XX 100059   RunQ[bufdaemon]
XX XX 100064   Run CPU 1   
[bufspacedaemon-0]
XX XX 100065   Run CPU 3   
[bufspacedaemon-1]
XX XX 100066   RunQ
[bufspacedaemon-2]
XX XX 100067   RunQ
[bufspacedaemon-3]
XX XX 100068   Run CPU 0   
[bufspacedaemon-4]
XX XX 100069   Run CPU 2   
[bufspacedaemon-5]
XX XX 100070   CanRun  
[bufspacedaemon-6]
XX XX 8 0 0 0  DL  (threaded)  [pagedaemon]
XX XX 100058   D   psleep  0x80c7b82d  [pagedaemon]
XX XX 100062   D   launds  0x80c7b834  [laundry: 
dom0]
XX XX 100063   D   umarcl  0x80676967  [uma]
XX XX 7 0 0 0  DL  -   0x80c73dd4  [soaiod4]
XX XX 6 0 0 0  DL  -   0x80c73dd4  [soaiod3]
XX XX 5 0 0 0  DL  -   0x80c73dd4  [soaiod2]
XX XX --More--4 0 0 0  DL  -   
0x80c73dd4  [soaiod1]
XX XX15 0 0 0  DL  cooling 0xf8000186a758  
[acpi_cooling1]
XX XX14 0 0 0  DL  tzpoll  0x80aaa110  
[acpi_thermal]
XX XX 3 0 0 0  DL  -   0x80aab218  
[rand_harvestq]
XX XX13 0 0 0  DL  (threaded)  [usb]
XX XX 100023   D   -   0xfe00839d4460  [usbus0]
XX XX 100024   D   -   0xfe00839d44b8  

RE: [Bug 227404] UP FreeBSD VM always hangs on reboot since 20180329-r331740

2018-04-10 Thread Dexuan Cui via freebsd-bugs
> From: Bruce Evans
> Sent: Tuesday, April 10, 2018 00:45
> 
> On Tue, 10 Apr 2018 a bug that doesn't want repl...@freebsd.org wrote:
> 
> (The bug didn't even Cc freebsd-bugs for this followup.)

Thanks for the reminder! I Cc'd bugs@ just now.

> > --- Comment #4 from Dexuan Cui ---
> ...
> 
> I think I saw this a few months before that.
> 
> My only history of this is that I built a UP kernel on 17 Dec 2017 to see
> if UP kernels had the bug.  So SMP kernels probably had the bug then.
> 
> Bruce

Here the bug is that UP FreeBSD VM hangs on reboot or power-off, and
I'm sure this recent patch (which was commited by Jeff on Mar 26) caused this 
bug:
https://github.com/freebsd/freebsd/commit/63a483ed5f4eaadb8979992c7a5de24c7a471c61
("Fix a bug introduced in r329612 that slowly invalidates all clean bufs").

However, SMP VM with 2 or more CPUs doesn't hang on reboot/power-off
according to our tests.

Thanks,
-- Dexuan
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


Re: [Bug 227404] UP FreeBSD VM always hangs on reboot since 20180329-r331740

2018-04-10 Thread Bruce Evans

On Tue, 10 Apr 2018 a bug that doesn't want repl...@freebsd.org wrote:

(The bug didn't even Cc freebsd-bugs for this followup.)


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404



--- Comment #4 from Dexuan Cui  ---
I think the first bad patch is this one:
https://github.com/freebsd/freebsd/commit/63a483ed5f4eaadb8979992c7a5de24c7a471c61
(Fix a bug introduced in r329612 that slowly invalidates all clean bufs.):

Today's
https://github.com/freebsd/freebsd/commit/66e8725e8d24141506bc4f458ec7d1a51e86304c
is broken, but if I revert 63a483ed5f4eaadb8979992c7a5de24c7a471c61, the bug
can not reproduce.

Cc bde & jeff.


I think I saw this a few months before that.

My only history of this is that I built a UP kernel on 17 Dec 2017 to see
if UP kernels had the bug.  So SMP kernels probably had the bug then.

Bruce
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


Re: [Bug 227404] UP FreeBSD VM always hangs on reboot since 20180329-r331740

2018-04-10 Thread Bruce Evans

On Tue, 10 Apr 2018 a bug that doesn't want repl...@freebsd.org wrote:


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
...
--- Comment #1 from Dexuan Cui  ---
When the issue happens, the cpu utilization of the UP VM is 100%.

While we're trying to find the first bad revision, it would be great if
somebody can report if the issue also happens to bare metal or other
hypervisors.


This has been happening for at least several months on real hardware too.
SMP kernels hang on a 1-CPU system and on an 8-CPU systems with all except
1 CPU turned off in the BIOS.  They don't hang on the 8-CPU system with at
least 2 CPUs turned on.  UP kernels don't hang.  I use SCHED_4BSD.
SCHED_ULE is apparently not much different for this.

Bruce
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"