On Tue, 28 Nov 2017, Andrew Turner wrote:

Log:
 When we exit the kernel debugger having entered because of a breakpoint
 instruction we need to jump over the instruction. Without this we will
 execute the same instruction again and enter into the debugger again.

 PR:            223917
 Reported by:   emaste
 MFC after:     1 week
 Sponsored by:  DARPA, AFRL

This belongs in the debugger.  ddb has several macros for skipping back and
forth over the breakpoint instruction, and normally returns with it skipped
over.  gdb presumably does the same.  The macros seem to be correct for arm64.

Modified: head/sys/arm64/arm64/trap.c
==============================================================================
--- head/sys/arm64/arm64/trap.c Tue Nov 28 09:34:43 2017        (r326311)
+++ head/sys/arm64/arm64/trap.c Tue Nov 28 11:04:47 2017        (r326312)
@@ -323,7 +323,10 @@ do_el1h_sync(struct thread *td, struct trapframe *fram
                        break;
                }
#endif
-               /* FALLTHROUGH */
+               kdb_trap(exception, 0,
+                   (td->td_frame != NULL) ? td->td_frame : frame);
+               frame->tf_elr -= 4;

This seems to have the wrong sign as well as being fundamentally wrong.
Execution only decrements the pc on exotic arches, and arm64 seems to
be non-exotic, since the BKPT_SKIP() instruction in db_machdep.h for
skipping the pc does frame->tf_elr += BKPT_SIZE.

BKPT_SKIP also does kdb_thrctx->pcb_rip += BKPT_SIZE on amd64, i386 and mips.
This is missing on arm, arm64 and riscv and sparc64.

BKPT_SKIP is missing on powerpc.

On sparc64, BKPT_SKIP also advances a second pc by 2*BKPT_SIZE.

+               break;
        case EXCP_WATCHPT_EL1:
        case EXCP_SOFTSTP_EL1:
#ifdef KDB

Falling through was almost correct.  I don't see anything wrong with
BKPT_SKIP(), so the bug seems to break the usual case where ddb
skips over the breakpoint.  Then the above reverse the skip.  Perhaps
gdb is broken and advances by 8.  Then the above corrects +8 to +4.
Unusual cases are much more complicated and broken.  Sometimes the
instruction needs to be restarted or user has set the pc to an unrelated
value -- then only the debugger knows what the pc should be.

Old bugs in the fall-through case:
- non-bug: when KDB is not configured, panic is called.  This is correct,
  especially for the breakpoint case since the instruction stream might
  be corrupt in this case.
- when KDB configured but no backend is attached, kdb_trap() fails and
  panic should be called then, as on x86, for the same reasons as when KDB
  is not configured.
- it is a style bug to ifdef the call to kdb_trap() with KDB.  When KDB is
  not configured, kdb_trap() fails and correct code checks for this and
  panics, as on x86.
- non-bug: since the hardware doesn't advance the pc on arm64, the
  instruction stream is actually never corrupt when kdb_trap() fails
- kdb_trap() can also fail when ddb is attached but the console is
  transiently unavailable.  Then the instruction stream is corrupt in some
  cases, but it is wrong to turn console unavailability into a panic.  This
  is partly fixed in my version, but fixing up the instruction stream and
  returning success from ddb (with the usual complications for continuing
  over a breakpoint).  Console unavailability remains very broken as
  implemented -- it does little except try to give this panic but not give
  it in many cases.

New bugs given by not falling through:
- non-bug: it is missing the style bug of the KDB ifdef
- it is missing the panic in the case where KDB is not configured
- non-new-bug: when KDB is not configured it is missing the error check and
  panic as before.

One case that is clearly broken now but probably worked before is when
multiple CPUs hit the same breakpoint and the breakpoint is removed on
one of them that is not the last one to enter ddb.  Then the correct
behaviour is:

- enter with all pc's advanced or not over the breakpoint instruction
  (assume that it is a specialy instruction written into the instruction
  stream; this corrupts the stream and must be fixed up)
- in each ddb instance, unadvance the pc if necessary to the breakpoint
  instruction
- determine if the pc is for a breakpoint set by us.  Until the breakpoint
  is unset by one of the ddb instances, find that it was set by us.
  Continue normally in this case (put back the original instruction and
  single-step over it.  This commit seems to completely break this handling,
  even for a single ddb instance).  Then get a single-step trap and put
  back the breakpoint instruction.)
- there is not enough synchronization, so ddb loses control when it returns
  to single-step.  Other ddb instances hitting the breakpoint see the
  original instruction while racing with the previous step.  They should
  put it back over itself and do much the same.
- eventually 1 of the ddb instances removes the breakpoint.  It puts back
  the original instruction and continues without single-stepping.
- ddb instances after this enter with a breakpoint trap but no sign of
  the breakpoint in the instruction stream.  They must restart at where
  the breakpoint was.  This is very broken on x86 (except in my version).
  I think it used to work automatically on arm64, because the hardware
  doesn't advance the pc.  Now the software breaks it by advancing
  unconditionally.

Here is the ddb code for fixing up the pc after a breakpoint, with my fixes
applied:

X bool
X db_stop_at_pc(int type, int code, bool *is_breakpoint, bool *is_watchpoint)
X {
X       db_addr_t       pc;
X       db_breakpoint_t bkpt;
X X *is_breakpoint = IS_BREAKPOINT_TRAP(type, code);
X       *is_watchpoint = IS_WATCHPOINT_TRAP(type, code);
X       pc = PC_REGS();
X       if (db_pc_is_singlestep(pc))
X               *is_breakpoint = false;

ddb watchpoints are mostly broken and cause confusion.  x86 doesn't
have any, but maps x86 hardware breakpoints to ddb watchpoints.  I
think the latter are supposed to be in software and can be implemented
using software interrupts, but x86 doesn't bother.  It is unclear what
they can do that breakpoint software interrupts can't do -- they can
at least give a different class of breakpoints encoded by the software
interrupt number.  arm64 has a watchpoint exception number
EXCP_WATCHPOINT_EL1.  This maps to is_watchpoint = TRUE in the above.

X X db_clear_single_step();
X       db_clear_breakpoints();
X       db_clear_watchpoints();
X X #ifdef FIXUP_PC_AFTER_BREAK
X       if (*is_breakpoint) {
X           /*
X            * Breakpoint trap.  Fix up the PC if the
X            * machine requires it.
X            */
X           FIXUP_PC_AFTER_BREAK
X           pc = PC_REGS();
X       }
X #endif

FIXUP_PC_AFTER_BREAK must be defined by arches where the hardware doesn't
advance the pc over breakpoint instructions in the instruction stream.
On x86, it unadvances the pc by 1.  On arm64, it is not defined, since the
pc is already unadvanced.

X X /*
X        * Now check for a breakpoint at this address.
X        */
X       bkpt = db_find_breakpoint_here(pc);
X       if (bkpt) {
X           if (--bkpt->count == 0) {
X               bkpt->count = bkpt->init_count;
X               *is_breakpoint = true;
X               return (true);  /* stop here */
X           }
X           return (false);     /* continue the countdown */
X       } else if (*is_breakpoint) {
X #ifdef BKPT_SKIP

We unadvanced the pc to look it up in the breakpoint table.  Now we
must advance it over the breakpoint instruction to restart from there
since it was not in our table.  It should be part of the normal instruction
stream.  If another debugger overwrote the instruction stream and gave
control to us, then that is a bug in the other debugger and we will
probably crash soon.

X #ifdef SMP
X           /*
X            * In the SMP case, 'bkpt' doesn't tell us if this breakpoint
X            * was controlled by us when we hit it, since the breakpoint
X            * may have been cleared during a ddb entry by another CPU
X            * which won the race to enter.  Only skip if the breakpoint
X            * is still in the instruction stream.  Otherwise, turn the
X            * breakpoint into an unknown trap except for ignoring it if
X            * we are continuing.
X            */
X           BKPT_INST_TYPE memval, testval;
X X memval = db_get_value(pc, BKPT_SIZE, FALSE);
X           testval = BKPT_SET(&testval);
X           if (bcmp(&memval, &testval, BKPT_SIZE) == 0)
X               BKPT_SKIP;
X           else {
X               *is_breakpoint = false;
X               if (db_run_mode == STEP_CONTINUE)
X                   return (false);     /* continue */
X           }

For the SMP case, the other debugger can be ddb, and this is my fix for it.

X #else
X           BKPT_SKIP;

The !SMP case is simpler.  BKPT_SKIP usually just advances the pc by
BKPT_SIZE, but does a contitional advance related to the current problem
on mips.

X #endif
X #endif
X       }

We already returned with the unadvanced pc if we are contining over a
breakpoint.  Advancing in the caller breaks this.  Unadvancing breaks it
more.

X X *is_breakpoint = false; /* might be a breakpoint, but not ours */

Here we continue with the advanced pc.  Again it seems to be correct on
arm64, and advancing or unadvancing in the caller breaks it.

Here are my fixes in the above code:

X Index: db_run.c
X ===================================================================
X --- db_run.c  (revision 325403)
X +++ db_run.c  (working copy)
X @@ -36,6 +36,7 @@
X  __FBSDID("$FreeBSD$");
X X #include <sys/param.h>
X +#include <sys/systm.h>
X  #include <sys/kdb.h>
X  #include <sys/proc.h>
X X @@ -129,8 +130,31 @@
X           return (false);     /* continue the countdown */
X       } else if (*is_breakpoint) {
X  #ifdef BKPT_SKIP
X +#ifdef SMP
X +         /*
X +          * In the SMP case, 'bkpt' doesn't tell us if this breakpoint
X +          * was controlled by us when we hit it, since the breakpoint
X +          * may have been cleared during a ddb entry by another CPU
X +          * which won the race to enter.  Only skip if the breakpoint
X +          * is still in the instruction stream.  Otherwise, turn the
X +          * breakpoint into an unknown trap except for ignoring it if
X +          * we are continuing.
X +          */
X +         BKPT_INST_TYPE memval, testval;
X +
X +         memval = db_get_value(pc, BKPT_SIZE, FALSE);
X +         testval = BKPT_SET(&testval);

BKPT_SET() is a misimplemented macro that takes an arg which is always ignored;
it is always implemented as a manifest constant (ddb macros mostly have the
opposite error of looking like manifest constants but being function-like
macros with no args but side effects).  It is apparently meant for setting
a breakpoint instruction directly in the instruction stream, but that is
too hard so it is always used as above by setting a local variable and then
copying that to memory using a special ddb access function.

X +         if (bcmp(&memval, &testval, BKPT_SIZE) == 0)
X               BKPT_SKIP;

kib didn't like the MD'ness of this, but it is no worse than hard-coding
the setting of the instruction using the accesss function because the
BKPT_SET() macro is useless for applying MD settings to the instruction
stream (as used, it might apply them only to a local variable)).

X +         else {
X +             *is_breakpoint = false;
X +             if (db_run_mode == STEP_CONTINUE)
X +                 return (false);     /* continue */
X +         }
X +#else
X +         BKPT_SKIP;
X  #endif
X +#endif
X       }
X X *is_breakpoint = false; /* might be a breakpoint, but not ours */
X

Behaviour of various arches for advancing and unadvancing the pc after
a breakpoint:
- and64, i386: +-1 over 1-byte instruction.  1 is spelled as itself instead
  of as BKPT_SIZE in the implementation.  Most implentations have bogus
  parentheses around literal constants.
- arm: +4 and no unadvance over 4-byte instruction.  4 is spelled BKPT_SIZE
  and further obfuscated by spelling BKPT_SIZE as INSN_SIZE in the
  implementation.  INSN_SIZE is defined in another header, so db_machdep.h
  requires more namespace pollution than it should.  BKPT_INSTRUCTION is
  similarly obfuscated by spelling it as KERNEL_BREAKPOINT which is defined
  in yet another header so even more pollution is required.
- arm64: like arm except without the pollution.  Minor obfuscation of spelling
  4 as BKPT_SIZE in the implementation.  The change should use the macro
  if it is correct, but spells BKPT_SIZE as -4.
- mips: +4 and no unadvance over 4-byte instruction.  4 is spelled BKPT_SIZE
  BKPT_SIZE but BKPT_SIZE is not obfuscated as on arm.  BREAKPOINT_INSTRUCTION
  doesn't exit and shouldn't be exported as on arm*, but its literal value
  is more interesting.  mips apparently has many software interrupts which
  give the same debugger exception and the instruction has to be decoded
  to decide which one.  BKPT_SE() returns one for ddb, MIPS_BREAK_DDB, and
  pollution is implemented by not defining this in ddb's header file.
  BKPT_SKIP() is complicted to read the instruction stream much like my
  patch does.  Although MIPS_BREAK_DDB is a single instruction, BKPT_SKIP
  ignores some bits in the comparision so as to skip for this and some
  other instructions.  It makes some sense to use different software
  interrupts for different debuggers, but I don't see how this can fix
  the problem of debuggers possibly corrupting each others' instruction
  streams with breapoint instructions.  It can help detect the problem.
- powerpc: like arm64
- riscv: like arm
- sparc64: like arm64, except it spells BKPT_SIZE as 4 in the implementation
  and has a secondary pc which it advances.

amd64 and i386 are the only arches where the hardware advances over the
breakpoint so that the unadvance is needed.

kdb seems to be most broken for powerpc.  powerpc has no advance and no
unadvance.  It doesn't even call kdb_reenter().

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to