subject:"\[PATCH v5\] Fix INT1 Recursion with unregistered breakpoints"

Re: [PATCH v5] Fix INT1 Recursion with unregistered breakpoints

2015-12-19 Thread Jeff Merkey

>   /*
> +  * Check if we got an execute breakpoint, if so
> +  * set the resume flag to avoid int1 recursion.
> +  */
> + if ((dr7 & (DR_RW_MASK << ((i * DR_CONTROL_SIZE) +
> + DR_CONTROL_SHIFT))) == DR_RW_EXECUTE)
> + args->regs->flags |= X86_EFLAGS_RF;
> +
> + /*

This patch is functionaly correct but I still need to make it perfect.
I more try and test run.
I tested this and it works, but I need to shift it the other way
(right) rather than (left) since DR_RW_EXECUTE is always 0 this will
always work but to be perfect I need to do:

if (((dr7 >> ((i * DR_CONTROL_SIZE) + DR_CONTROL_SHIFT)) & DR_RW_MASK)
== DR_RW_EXECUTE)

and that is perfect syntax.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] Fix INT1 Recursion with unregistered breakpoints

2015-12-19 Thread Jeff Merkey

Please consider the attached patch.

SUMMARY

This patch corrects a hard lockup failure of the system kernel if the
operating system receives a breakpoint exception at a code execution
address which was not registered with the operating system.  The patch
allows kernel debuggers, application profiling and performance modules,
and external debugging tools to work better together at sharing the
breakpoint registers on the platform in a way that they do not cause
errors and system faults, and enables the full feature set in the
breakpoint API.  If a kernel application triggers a breakpoint
or programs one in error, this patch will catch the condition and report
it to the system log without the operating system experiencing a system
fault.  There are several consumers of the Linux Breakpoint API and all
of them can and sometimes do cause the condition this patch corrects.

CONDITIONS WHICH RESULT IN THIS SYSTEM FAULT

This system fault can be caused from several sources.  Any kernel code
can access the debug registers and trigger a breakpoint directly by
writing to these registers and trigger a hard system hang if no
breakpoint was registered via arch_install_hw_breakpoint().

kgdb/kdb and the perf event system both present garbage status in dr6
then subsequently write this status into the thread.debugreg6 variable,
then in some cases call hw_breakpoint_restore() which writes this
status back into the dr6 hardware register.

arch/x86/kernel/kgdb.c
static void kgdb_hw_overflow_handler(struct perf_event *event,
struct perf_sample_data *data, struct pt_regs *regs)
{
struct task_struct *tsk = current;
int i;

for (i = 0; i < 4; i++)
if (breakinfo[i].enabled)
tsk->thread.debugreg6 |= (DR_TRAP0 << i);
}

arch/x86/kernel/kgdb.c
static void kgdb_correct_hw_break(void)
{
... snip ...

if (!dbg_is_early)
hw_breakpoint_restore();

... snip ...
}

arch/x86/kernel/hw_breakpoint.c
void hw_breakpoint_restore(void)
{
set_debugreg(__this_cpu_read(cpu_debugreg[0]), 0);
set_debugreg(__this_cpu_read(cpu_debugreg[1]), 1);
set_debugreg(__this_cpu_read(cpu_debugreg[2]), 2);
set_debugreg(__this_cpu_read(cpu_debugreg[3]), 3);
set_debugreg(current->thread.debugreg6, 6);
set_debugreg(__this_cpu_read(cpu_dr7), 7);
}

The hardware only altars those bits that change, the rest of the altered
dr6 value remains in the register.

Upon the next int1 exception, dr6 presents this manufactured status to
the int1 handler in hw_breakpoint.c which calls the non-existent
breakpoint exceptions and any handlers which may have validly
registered, creating phantom events.  If other subsystems which call
the perf handlers also have breakpoints registered, this
manufactured status causes erroneous events to be signaled to the layers
above.

arch/x86/kernel/hw_breakpoint.c
static int hw_breakpoint_handler(struct die_args *args)
{
... snip ...

/* Handle all the breakpoints that were triggered */
for (i = 0; i < HBP_NUM; ++i) {
if (likely(!(dr6 & (DR_TRAP0 << i
continue;

... snip ...

perf_bp_event(bp, args->regs);

... snip ...
}

After a few iterations of this cycling through the system, the
thread.debugreg6 variable starts to resemble a random number generator
as far as to which breakpoint just occurred.

The perf handlers cause a different incarnation of this problem and
create the situation by triggering a stale breakpoint set in dr7 for
which the perf bp is NULL (not registered) or late and for which there
is a single code path that fails to set the resume flag and clear the
int1 exception status.

TESTING AND REVIEW PERFORMED

I have reviewed all the code that touches this patch and have
determined it will function and support all of the software that
depends on this handler properly.  I have compiled and tested this
patch with a test harness that tests the robustness of the linux
breakpoint API and handlers in the following ways:

1.  Setting multiple conditional breakpoints through
arch_install_hw_breakpoint API across four processors to test the rate
at which the interface can handle breakpoint exceptions

2.  Setting unregistered breakpoints to test the handlers robustness
in dealing with error handling conditions and errant or spurious
hardware conditions and to simulate actual "lazy debug register
switching" with null bp handlers to test the
robustness of the handlers.

3.  Clearing and setting breakpoints across multiple processors then
triggering concurrent exceptions in both interrupt and process
contexts.

This patch improves robustness in several ways in the linux kernel:

1.  Corrects bug in handling unregistered breakpoints.

2.  Provides hardware check of dr7 to determine source of breakpoint
if OS cannot ascertain the int1 source from its own state

Re: [PATCH v5] Fix INT1 Recursion with unregistered breakpoints

2015-12-19 Thread Jeff Merkey

>   /*
> +  * Check if we got an execute breakpoint, if so
> +  * set the resume flag to avoid int1 recursion.
> +  */
> + if ((dr7 & (DR_RW_MASK << ((i * DR_CONTROL_SIZE) +
> + DR_CONTROL_SHIFT))) == DR_RW_EXECUTE)
> + args->regs->flags |= X86_EFLAGS_RF;
> +
> + /*

This patch is functionaly correct but I still need to make it perfect.
I more try and test run.
I tested this and it works, but I need to shift it the other way
(right) rather than (left) since DR_RW_EXECUTE is always 0 this will
always work but to be perfect I need to do:

if (((dr7 >> ((i * DR_CONTROL_SIZE) + DR_CONTROL_SHIFT)) & DR_RW_MASK)
== DR_RW_EXECUTE)

and that is perfect syntax.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] Fix INT1 Recursion with unregistered breakpoints

2015-12-19 Thread Jeff Merkey

Please consider the attached patch.

SUMMARY

This patch corrects a hard lockup failure of the system kernel if the
operating system receives a breakpoint exception at a code execution
address which was not registered with the operating system.  The patch
allows kernel debuggers, application profiling and performance modules,
and external debugging tools to work better together at sharing the
breakpoint registers on the platform in a way that they do not cause
errors and system faults, and enables the full feature set in the
breakpoint API.  If a kernel application triggers a breakpoint
or programs one in error, this patch will catch the condition and report
it to the system log without the operating system experiencing a system
fault.  There are several consumers of the Linux Breakpoint API and all
of them can and sometimes do cause the condition this patch corrects.

CONDITIONS WHICH RESULT IN THIS SYSTEM FAULT

This system fault can be caused from several sources.  Any kernel code
can access the debug registers and trigger a breakpoint directly by
writing to these registers and trigger a hard system hang if no
breakpoint was registered via arch_install_hw_breakpoint().

kgdb/kdb and the perf event system both present garbage status in dr6
then subsequently write this status into the thread.debugreg6 variable,
then in some cases call hw_breakpoint_restore() which writes this
status back into the dr6 hardware register.

arch/x86/kernel/kgdb.c
static void kgdb_hw_overflow_handler(struct perf_event *event,
struct perf_sample_data *data, struct pt_regs *regs)
{
struct task_struct *tsk = current;
int i;

for (i = 0; i < 4; i++)
if (breakinfo[i].enabled)
tsk->thread.debugreg6 |= (DR_TRAP0 << i);
}

arch/x86/kernel/kgdb.c
static void kgdb_correct_hw_break(void)
{
... snip ...

if (!dbg_is_early)
hw_breakpoint_restore();

... snip ...
}

arch/x86/kernel/hw_breakpoint.c
void hw_breakpoint_restore(void)
{
set_debugreg(__this_cpu_read(cpu_debugreg[0]), 0);
set_debugreg(__this_cpu_read(cpu_debugreg[1]), 1);
set_debugreg(__this_cpu_read(cpu_debugreg[2]), 2);
set_debugreg(__this_cpu_read(cpu_debugreg[3]), 3);
set_debugreg(current->thread.debugreg6, 6);
set_debugreg(__this_cpu_read(cpu_dr7), 7);
}

The hardware only altars those bits that change, the rest of the altered
dr6 value remains in the register.

Upon the next int1 exception, dr6 presents this manufactured status to
the int1 handler in hw_breakpoint.c which calls the non-existent
breakpoint exceptions and any handlers which may have validly
registered, creating phantom events.  If other subsystems which call
the perf handlers also have breakpoints registered, this
manufactured status causes erroneous events to be signaled to the layers
above.

arch/x86/kernel/hw_breakpoint.c
static int hw_breakpoint_handler(struct die_args *args)
{
... snip ...

/* Handle all the breakpoints that were triggered */
for (i = 0; i < HBP_NUM; ++i) {
if (likely(!(dr6 & (DR_TRAP0 << i
continue;

... snip ...

perf_bp_event(bp, args->regs);

... snip ...
}

After a few iterations of this cycling through the system, the
thread.debugreg6 variable starts to resemble a random number generator
as far as to which breakpoint just occurred.

The perf handlers cause a different incarnation of this problem and
create the situation by triggering a stale breakpoint set in dr7 for
which the perf bp is NULL (not registered) or late and for which there
is a single code path that fails to set the resume flag and clear the
int1 exception status.

TESTING AND REVIEW PERFORMED

I have reviewed all the code that touches this patch and have
determined it will function and support all of the software that
depends on this handler properly.  I have compiled and tested this
patch with a test harness that tests the robustness of the linux
breakpoint API and handlers in the following ways:

1.  Setting multiple conditional breakpoints through
arch_install_hw_breakpoint API across four processors to test the rate
at which the interface can handle breakpoint exceptions

2.  Setting unregistered breakpoints to test the handlers robustness
in dealing with error handling conditions and errant or spurious
hardware conditions and to simulate actual "lazy debug register
switching" with null bp handlers to test the
robustness of the handlers.

3.  Clearing and setting breakpoints across multiple processors then
triggering concurrent exceptions in both interrupt and process
contexts.

This patch improves robustness in several ways in the linux kernel:

1.  Corrects bug in handling unregistered breakpoints.

2.  Provides hardware check of dr7 to determine source of breakpoint
if OS cannot ascertain the int1 source from its own state

Re: [PATCH v5] Fix INT1 Recursion with unregistered breakpoints

[PATCH v5] Fix INT1 Recursion with unregistered breakpoints

Re: [PATCH v5] Fix INT1 Recursion with unregistered breakpoints

[PATCH v5] Fix INT1 Recursion with unregistered breakpoints

4 matches

Site Navigation

Mail list logo

Footer information