Re: svn commit: r306346 - head/sys/kern
On 10/04/2016 15:56, Gleb Smirnoff wrote: > Eric, > > E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c > E> { > E>const struct mtx *m; > E> > E> - if (panicstr != NULL || dumping) > E> + if (panicstr != NULL || dumping || SCHEDULER_STOPPED()) > E>return; > > I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()? > Positive panicstr and dumping imply scheduler stopped. If I read correctly, that's /almost/ true, but the scheduler is only stopped #ifdef SMP and we're panicking, so we need the full expression to cover the !SMP and reboot(RB_DUMP) cases. Eric ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r306346 - head/sys/kern
On Thu, Oct 06, 2016 at 02:08:46PM +1100, Bruce Evans wrote: > On Wed, 5 Oct 2016, Slawa Olhovchenkov wrote: > > > On Wed, Oct 05, 2016 at 11:19:10AM +1100, Bruce Evans wrote: > > > >> On Tue, 4 Oct 2016, Gleb Smirnoff wrote: > >> > >>> On Mon, Sep 26, 2016 at 03:30:30PM +, Eric van Gyzen wrote: > >>> E> ... > >>> E> Modified: head/sys/kern/kern_mutex.c > >>> E> > >>> == > >>> E> --- head/sys/kern/kern_mutex.c Mon Sep 26 15:03:31 2016 > >>> (r306345) > >>> E> +++ head/sys/kern/kern_mutex.c Mon Sep 26 15:30:30 2016 > >>> (r306346) > >>> E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c > >>> E> { > >>> E>const struct mtx *m; > >>> E> > >>> E> - if (panicstr != NULL || dumping) > >>> E> + if (panicstr != NULL || dumping || SCHEDULER_STOPPED()) > >>> E>return; > >>> > >>> I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()? > >>> Positive panicstr and dumping imply scheduler stopped. > >> > >> 'dumping' doesn't imply SCHEDULER_STOPPED(). > >> > >> Checking 'dumping' here seems to be just an old bug. It just breaks > >> __mtx_assert(), while all other mutex operations work normally for dumping > >> without panicing. > > > > [...] > > > > Is this related to halted (not reboted) 11.0 after ~^B and `panic`? > > There might be related problems, but I don't see any here. > > > What I see on serial console: > > = > > db> panic > > panic: from debugger > > I wouldn't trust panic from the debugger, but it is safer than dump > from the debugger (both are ddb commands, but this is another bug). > > > cpuid = 1 > > KDB: stack backtrace: > > db_trace_self_wrapper() at 0x8031fadb = > > db_trace_self_wrapper+0x2b/frame 0xfe1f9e198120 > > vpanic() at 0x804a0302 = vpanic+0x182/frame 0xfe1f9e1981a0 > > panic() at 0x804a0383 = panic+0x43/frame 0xfe1f9e198200 > > db_panic() at 0x8031d987 = db_panic+0x17/frame 0xfe1f9e198210 > > db_command() at 0x8031d019 = db_command+0x299/frame > > 0xfe1f9e1982e0 > > db_command_loop() at 0x8031cd74 = db_command_loop+0x64/frame > > 0xfe1f9e1982f0 > > db_trap() at 0x8031fc1b = db_trap+0xdb/frame 0xfe1f9e198380 > > kdb_trap() at 0x804dd8c3 = kdb_trap+0x193/frame 0xfe1f9e198410 > > trap() at 0x806e3065 = trap+0x255/frame 0xfe1f9e198620 > > calltrap() at 0x806cafd1 = calltrap+0x8/frame 0xfe1f9e198620 > > --- trap 0x3, rip = 0x804dd11e, rsp = 0xfe1f9e1986f0, rbp = > > 0xfe1f9e198710 --- > > kdb_alt_break_internal() at 0x804dd11e = > > kdb_alt_break_internal+0x18e/frame 0xfe1f9e198710 > > kdb_alt_break() at 0x804dcf8b = kdb_alt_break+0xb/frame > > 0xfe1f9e198720 > > uart_intr_rxready() at 0x803e38a8 = uart_intr_rxready+0x98/frame > > 0xfe1f9e198750 > > uart_intr() at 0x803e4621 = uart_intr+0x121/frame 0xfe1f9e198790 > > intr_event_handle() at 0x8046c74b = intr_event_handle+0x9b/frame > > 0xfe1f9e1987e0 > > intr_execute_handlers() at 0x8076d2d8 = > > intr_execute_handlers+0x48/frame 0xfe1f9e198810 > > lapic_handle_intr() at 0x8077163f = lapic_handle_intr+0x3f/frame > > 0xfe1f9e198830 > > Xapic_isr1() at 0x806cb6b7 = Xapic_isr1+0xb7/frame > > 0xfe1f9e198830 > > --- interrupt, rip = 0x8032fedf, rsp = 0xfe1f9e198900, rbp = > > 0xfe1f9e198940 --- > > acpi_cpu_idle() at 0x8032fedf = acpi_cpu_idle+0x2af/frame > > 0xfe1f9e198940 > > cpu_idle_acpi() at 0x8076ad1f = cpu_idle_acpi+0x3f/frame > > 0xfe1f9e198960 > > cpu_idle() at 0x8076adc5 = cpu_idle+0x95/frame 0xfe1f9e198980 > > sched_idletd() at 0x804cbbe5 = sched_idletd+0x495/frame > > 0xfe1f9e198a70 > > fork_exit() at 0x8046a211 = fork_exit+0x71/frame 0xfe1f9e198ab0 > > fork_trampoline() at 0x806cb50e = fork_trampoline+0xe/frame > > 0xfe1f9e198ab0 > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > This looks like a normal kdb entry then a not so normal panic from ddb, > but no problems. Yes, I am just capture all output from console after command (`panic`). > > Uptime: 1d4h53m19s > > Dumping 12148 out of 131020 > > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Dump complete > > mps2: Sending StopUnit: path (xpt0:mps2:0:14:): handle 12 > > mps2: Incrementing SSU count > > mps2: Sending StopUnit: path (xpt0:mps2:0:18:): handle 9 > > mps2: Incrementing SSU count > > = > > > > This is normal reboot (by /sbin/reboot): > > Is the above just a hung dump from reboot, before going near ddb? That > case should work, but perhaps it needs to be more careful about waiting > for the other CPUs. Just stopping them is no good since it gives an > even more fragile environment, like panicing or entering ddb. Above is attempt to
Re: svn commit: r306346 - head/sys/kern
On Wed, 5 Oct 2016, Slawa Olhovchenkov wrote: On Wed, Oct 05, 2016 at 11:19:10AM +1100, Bruce Evans wrote: On Tue, 4 Oct 2016, Gleb Smirnoff wrote: On Mon, Sep 26, 2016 at 03:30:30PM +, Eric van Gyzen wrote: E> ... E> Modified: head/sys/kern/kern_mutex.c E> == E> --- head/sys/kern/kern_mutex.cMon Sep 26 15:03:31 2016(r306345) E> +++ head/sys/kern/kern_mutex.cMon Sep 26 15:30:30 2016(r306346) E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c E> { E> const struct mtx *m; E> E> - if (panicstr != NULL || dumping) E> + if (panicstr != NULL || dumping || SCHEDULER_STOPPED()) E> return; I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()? Positive panicstr and dumping imply scheduler stopped. 'dumping' doesn't imply SCHEDULER_STOPPED(). Checking 'dumping' here seems to be just an old bug. It just breaks __mtx_assert(), while all other mutex operations work normally for dumping without panicing. [...] Is this related to halted (not reboted) 11.0 after ~^B and `panic`? There might be related problems, but I don't see any here. What I see on serial console: = db> panic panic: from debugger I wouldn't trust panic from the debugger, but it is safer than dump from the debugger (both are ddb commands, but this is another bug). cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at 0x8031fadb = db_trace_self_wrapper+0x2b/frame 0xfe1f9e198120 vpanic() at 0x804a0302 = vpanic+0x182/frame 0xfe1f9e1981a0 panic() at 0x804a0383 = panic+0x43/frame 0xfe1f9e198200 db_panic() at 0x8031d987 = db_panic+0x17/frame 0xfe1f9e198210 db_command() at 0x8031d019 = db_command+0x299/frame 0xfe1f9e1982e0 db_command_loop() at 0x8031cd74 = db_command_loop+0x64/frame 0xfe1f9e1982f0 db_trap() at 0x8031fc1b = db_trap+0xdb/frame 0xfe1f9e198380 kdb_trap() at 0x804dd8c3 = kdb_trap+0x193/frame 0xfe1f9e198410 trap() at 0x806e3065 = trap+0x255/frame 0xfe1f9e198620 calltrap() at 0x806cafd1 = calltrap+0x8/frame 0xfe1f9e198620 --- trap 0x3, rip = 0x804dd11e, rsp = 0xfe1f9e1986f0, rbp = 0xfe1f9e198710 --- kdb_alt_break_internal() at 0x804dd11e = kdb_alt_break_internal+0x18e/frame 0xfe1f9e198710 kdb_alt_break() at 0x804dcf8b = kdb_alt_break+0xb/frame 0xfe1f9e198720 uart_intr_rxready() at 0x803e38a8 = uart_intr_rxready+0x98/frame 0xfe1f9e198750 uart_intr() at 0x803e4621 = uart_intr+0x121/frame 0xfe1f9e198790 intr_event_handle() at 0x8046c74b = intr_event_handle+0x9b/frame 0xfe1f9e1987e0 intr_execute_handlers() at 0x8076d2d8 = intr_execute_handlers+0x48/frame 0xfe1f9e198810 lapic_handle_intr() at 0x8077163f = lapic_handle_intr+0x3f/frame 0xfe1f9e198830 Xapic_isr1() at 0x806cb6b7 = Xapic_isr1+0xb7/frame 0xfe1f9e198830 --- interrupt, rip = 0x8032fedf, rsp = 0xfe1f9e198900, rbp = 0xfe1f9e198940 --- acpi_cpu_idle() at 0x8032fedf = acpi_cpu_idle+0x2af/frame 0xfe1f9e198940 cpu_idle_acpi() at 0x8076ad1f = cpu_idle_acpi+0x3f/frame 0xfe1f9e198960 cpu_idle() at 0x8076adc5 = cpu_idle+0x95/frame 0xfe1f9e198980 sched_idletd() at 0x804cbbe5 = sched_idletd+0x495/frame 0xfe1f9e198a70 fork_exit() at 0x8046a211 = fork_exit+0x71/frame 0xfe1f9e198ab0 fork_trampoline() at 0x806cb50e = fork_trampoline+0xe/frame 0xfe1f9e198ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- This looks like a normal kdb entry then a not so normal panic from ddb, but no problems. Uptime: 1d4h53m19s Dumping 12148 out of 131020 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete mps2: Sending StopUnit: path (xpt0:mps2:0:14:): handle 12 mps2: Incrementing SSU count mps2: Sending StopUnit: path (xpt0:mps2:0:18:): handle 9 mps2: Incrementing SSU count = This is normal reboot (by /sbin/reboot): Is the above just a hung dump from reboot, before going near ddb? That case should work, but perhaps it needs to be more careful about waiting for the other CPUs. Just stopping them is no good since it gives an even more fragile environment, like panicing or entering ddb. === Sending StopUnit: path (xpt0:mps2:0:14:): handle 13 mps2: Incrementing SSU count mps2: Sending StopUnit: path (xpt0:mps2:0:18:): handle 9 mps2: Incrementing SSU count mps2: Decrementing SSU count. mps2: Completing stop unit for (xpt0:mps2:0:18:): mps2: Decrementing SSU count. mps2: Completing stop unit for (xpt0:mps2:0:14:): === mps2: lagg0: link state changed to DOWN Sending StopUnit: path (xpt0:mps2:0:14:): handle 12 mps2: Incrementing SSU count mps2: Sending StopUnit: path (xpt0:mps2:0:18:): handle 9 mps2: Incrementing SSU count mps2:
Re: svn commit: r306346 - head/sys/kern
On Wed, Oct 05, 2016 at 11:19:10AM +1100, Bruce Evans wrote: > On Tue, 4 Oct 2016, Gleb Smirnoff wrote: > > > On Mon, Sep 26, 2016 at 03:30:30PM +, Eric van Gyzen wrote: > > E> ... > > E> Modified: head/sys/kern/kern_mutex.c > > E> > > == > > E> --- head/sys/kern/kern_mutex.c Mon Sep 26 15:03:31 2016 > > (r306345) > > E> +++ head/sys/kern/kern_mutex.c Mon Sep 26 15:30:30 2016 > > (r306346) > > E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c > > E> { > > E> const struct mtx *m; > > E> > > E> -if (panicstr != NULL || dumping) > > E> +if (panicstr != NULL || dumping || SCHEDULER_STOPPED()) > > E> return; > > > > I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()? > > Positive panicstr and dumping imply scheduler stopped. > > 'dumping' doesn't imply SCHEDULER_STOPPED(). > > Checking 'dumping' here seems to be just an old bug. It just breaks > __mtx_assert(), while all other mutex operations work normally for dumping > without panicing. [...] Is this related to halted (not reboted) 11.0 after ~^B and `panic`? What I see on serial console: = db> panic panic: from debugger cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at 0x8031fadb = db_trace_self_wrapper+0x2b/frame 0xfe1f9e198120 vpanic() at 0x804a0302 = vpanic+0x182/frame 0xfe1f9e1981a0 panic() at 0x804a0383 = panic+0x43/frame 0xfe1f9e198200 db_panic() at 0x8031d987 = db_panic+0x17/frame 0xfe1f9e198210 db_command() at 0x8031d019 = db_command+0x299/frame 0xfe1f9e1982e0 db_command_loop() at 0x8031cd74 = db_command_loop+0x64/frame 0xfe1f9e1982f0 db_trap() at 0x8031fc1b = db_trap+0xdb/frame 0xfe1f9e198380 kdb_trap() at 0x804dd8c3 = kdb_trap+0x193/frame 0xfe1f9e198410 trap() at 0x806e3065 = trap+0x255/frame 0xfe1f9e198620 calltrap() at 0x806cafd1 = calltrap+0x8/frame 0xfe1f9e198620 --- trap 0x3, rip = 0x804dd11e, rsp = 0xfe1f9e1986f0, rbp = 0xfe1f9e198710 --- kdb_alt_break_internal() at 0x804dd11e = kdb_alt_break_internal+0x18e/frame 0xfe1f9e198710 kdb_alt_break() at 0x804dcf8b = kdb_alt_break+0xb/frame 0xfe1f9e198720 uart_intr_rxready() at 0x803e38a8 = uart_intr_rxready+0x98/frame 0xfe1f9e198750 uart_intr() at 0x803e4621 = uart_intr+0x121/frame 0xfe1f9e198790 intr_event_handle() at 0x8046c74b = intr_event_handle+0x9b/frame 0xfe1f9e1987e0 intr_execute_handlers() at 0x8076d2d8 = intr_execute_handlers+0x48/frame 0xfe1f9e198810 lapic_handle_intr() at 0x8077163f = lapic_handle_intr+0x3f/frame 0xfe1f9e198830 Xapic_isr1() at 0x806cb6b7 = Xapic_isr1+0xb7/frame 0xfe1f9e198830 --- interrupt, rip = 0x8032fedf, rsp = 0xfe1f9e198900, rbp = 0xfe1f9e198940 --- acpi_cpu_idle() at 0x8032fedf = acpi_cpu_idle+0x2af/frame 0xfe1f9e198940 cpu_idle_acpi() at 0x8076ad1f = cpu_idle_acpi+0x3f/frame 0xfe1f9e198960 cpu_idle() at 0x8076adc5 = cpu_idle+0x95/frame 0xfe1f9e198980 sched_idletd() at 0x804cbbe5 = sched_idletd+0x495/frame 0xfe1f9e198a70 fork_exit() at 0x8046a211 = fork_exit+0x71/frame 0xfe1f9e198ab0 fork_trampoline() at 0x806cb50e = fork_trampoline+0xe/frame 0xfe1f9e198ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 1d4h53m19s Dumping 12148 out of 131020 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete mps2: Sending StopUnit: path (xpt0:mps2:0:14:): handle 12 mps2: Incrementing SSU count mps2: Sending StopUnit: path (xpt0:mps2:0:18:): handle 9 mps2: Incrementing SSU count = This is normal reboot (by /sbin/reboot): === Sending StopUnit: path (xpt0:mps2:0:14:): handle 13 mps2: Incrementing SSU count mps2: Sending StopUnit: path (xpt0:mps2:0:18:): handle 9 mps2: Incrementing SSU count mps2: Decrementing SSU count. mps2: Completing stop unit for (xpt0:mps2:0:18:): mps2: Decrementing SSU count. mps2: Completing stop unit for (xpt0:mps2:0:14:): === mps2: lagg0: link state changed to DOWN Sending StopUnit: path (xpt0:mps2:0:14:): handle 12 mps2: Incrementing SSU count mps2: Sending StopUnit: path (xpt0:mps2:0:18:): handle 9 mps2: Incrementing SSU count mps2: Decrementing SSU count. mps2: Completing stop unit for (xpt0:mps2:0:18:): mps2: Decrementing SSU count. mps2: Completing stop unit for (xpt0:mps2:0:14:): ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r306346 - head/sys/kern
On Tue, 4 Oct 2016, Gleb Smirnoff wrote: On Mon, Sep 26, 2016 at 03:30:30PM +, Eric van Gyzen wrote: E> ... E> Modified: head/sys/kern/kern_mutex.c E> == E> --- head/sys/kern/kern_mutex.cMon Sep 26 15:03:31 2016(r306345) E> +++ head/sys/kern/kern_mutex.cMon Sep 26 15:30:30 2016(r306346) E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c E> { E> const struct mtx *m; E> E> - if (panicstr != NULL || dumping) E> + if (panicstr != NULL || dumping || SCHEDULER_STOPPED()) E> return; I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()? Positive panicstr and dumping imply scheduler stopped. 'dumping' doesn't imply SCHEDULER_STOPPED(). Checking 'dumping' here seems to be just an old bug. It just breaks __mtx_assert(), while all other mutex operations work normally for dumping without panicing. kern doesn't have this bug anywhere else. It just has style bugs for most references to 'dumping': X kern_mutex.c: * re-enable interrupts while dumping core. This is under another recent fix involving SCHEDULER_STOPPED(). X kern_mutex.c: if (panicstr != NULL || dumping || SCHEDULER_STOPPED()) Broken. X kern_shutdown.c:int dumping; /* system is dumping */ Banal comment. X kern_shutdown.c: if (dumping) X kern_shutdown.c: dumping++; X kern_shutdown.c: dumping--; Obfuscation of a boolean by manually optimizing its setting for PDP-11. X kern_shutdown.c: if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping) Missing spaced around binary operator. X sched_4bsd.c: if (panicstr != NULL || pri >= cpri || cold /* || dumping */ || Here the bogus test for dumping is commented out. This is in maybe_preempt(). There is a TD_IS_INHIBITED() check. sched_ule.c has similar code without the commented-out 'dumping'. Neither checks SCHEDULER_STOPPED(). It is certainly useless to preempt if SCHEDULER_STOPPED(), but perhaps checking it is unnecessary. I don't like the design or implementation of SCHEDULER_STOPPED(). It is a hack to specially break mutexes while panicing. Panicing stops the scheduler and tries to stop all other CPUs (fixed in my version to either actually stop them all, with special stopping for NMI handlers, or hang waiting) and we set the flag curthread->td_stopsched to indicate that the scheduler is specially stopped for panic. Bugs in the implementation of this include: - 2 bytes are wasted in struct thread to hold the flag. This is a dubious obfuscation of an old version that used a global flag - despite this optimization, all mutex operations should be slowed down by testing this flag - however, the inlined mutex operations don't test this flag. This gives inconsistencies. __mtx_assert() needed the fix in this commit to not detect these inconsistencies Bugs in the design of this include: - SCHEDULER_STOPPED() doesn't really mean that the scheduler has stopped. It means that we are panicing and have (tried to) stop other CPUs and want to forcer all mutex operations to silently succeed without keeping the mutex state consistent. This is fragile. Nothing can depend on mutexes working or mutex assertions finding inconsistencies or on mtx_owned() working, so the SCHEDULER_STOPPED() check must be done in more than central mutex code. It is confusing that this condition doesn't mean that the scheduler is stopped. It means that a certain state in panicing has been reached. Bruce ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r306346 - head/sys/kern
Eric, On Mon, Sep 26, 2016 at 03:30:30PM +, Eric van Gyzen wrote: E> Author: vangyzen E> Date: Mon Sep 26 15:30:30 2016 E> New Revision: 306346 E> URL: https://svnweb.freebsd.org/changeset/base/306346 E> E> Log: E> Make no assertions about mutex state when the scheduler is stopped. E> E> This changes the assert path to match the lock and unlock paths. E> E> MFC after: 1 week E> Sponsored by: Dell EMC E> E> Modified: E> head/sys/kern/kern_mutex.c E> E> Modified: head/sys/kern/kern_mutex.c E> == E> --- head/sys/kern/kern_mutex.c Mon Sep 26 15:03:31 2016 (r306345) E> +++ head/sys/kern/kern_mutex.c Mon Sep 26 15:30:30 2016 (r306346) E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c E> { E> const struct mtx *m; E> E> -if (panicstr != NULL || dumping) E> +if (panicstr != NULL || dumping || SCHEDULER_STOPPED()) E> return; I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()? Positive panicstr and dumping imply scheduler stopped. -- Totus tuus, Glebius. ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"