Re: [OpenRISC] [Openrisc] [RFC patch] OR1200 debug unit failing to continue from some stalls

Franck Jullien Wed, 03 Oct 2012 15:46:24 -0700

2012/9/25 Yann Vernier <[email protected]>:
> On Tue, 25 Sep 2012 03:16:48 -1000
> Julius Baxter <[email protected]> wrote:
>
>> On Tue, Sep 25, 2012 at 12:28 AM, Yann Vernier
>> <[email protected]> wrote:
>> > The thread http://opencores.org/forum,OpenRISC,0,5005 about GDB
>> > failing to single-step after certain breakpoints led us to a logic
>> > error in the debug unit. When the pipeline is stalled for other
>> > reasons, for instance an instruction fetch (therefore affected by
>> > instruction cache), at the cycle _after_ a trap instruction is
>> > decoded, the trap signal is never lowered since no new instruction
>> > is decoded. This locks the debug unit into stalling continuously.
>>
>> Hi Yann,
>>
>> Sounds like good work on tracking this down.
>>
>> Can this be recreated in simulation via the JTAG-via-VPI interface?
>> I'm not saying you need to, but it would be good to know that this bug
>> is fixed by observing it in simulation, and/or having a regression
>> test which, or instructions how to, trigger this bug. How did you work
>> this out this is the right fix?
>
> Yes. We confirmed the issue on hardware, reproduced it in simulation
> with VPI interface, saw the sig_trap line getting stuck, and found that
> it overrides stall register updates over JTAG via code inspection. The
> test case we used was or1200-cbasic, where (for this particular
> compilation) breakpoints worked on main but not test_bss. gdb was
> manually controlled, using break and si commands.
> The observed failure was when if_stall coincided with except_stop.
>
> This fix was then verified in simulation and synthesis.
>
>> > Attached is a workaround to make the debug unit aware that the
>> > sig_trap and sig_syscall signals only update when the ex stage is
>> > not stalled. This fixes the issue we've observed, although similar
>> > conditions may exist for any other exceptions. Also note that the
>> > debug interface uses a priority encoder to only catch one of the
>> > events that occur; short-lived events may therefore be ignored
>> > completely.
>>
>> Can you elaborate on this? Which "short-lived events" are you talking
>> about?
>
> By short-lived I mean exception signals which go back to 0 without being
> acknowledged. sig_trap and sig_syscall are simply two of the bits in the
> except_stop vector (which goes to the debug unit) as well as its relative
> except_trig (goes to the CPU). I'm simply not sure if any others may be
> short-lived, but the debug unit only catches one per cycle because the
> du_except_stop to except_stop recoder is a priority encoder.
> sig_trap is normally short-lived, but being part of the pipeline, may
> stall along with it.
>
> The patch forces the debug unit stall reason of a trap exception to
> seem short lived, such that if the debug unit happens to stop for
> another reason, it may not be evident that it was about to trap as
> well. In current gdb use, the DSR register only captures trap, so it
> may not be a problem.
>
> I'm not sure why the debug unit is written to only capture one reason
> per cycle anyway. It's going to stall anyhow, just not record all
> triggers.
>
>> > Any comments? Personally I feel it's a bit of a stopgap measure,
>> > addressing this problem narrowly, but it's better than the current
>> > state.
>>
>> Can you please post the patch inline?
>
> Certainly (assuming my mail client won't mangle it).
>
>
> Index: or1200_du.v
> ===================================================================
> --- or1200_du.v (revision 663)
> +++ or1200_du.v (working copy)
> @@ -412,6 +412,7 @@
>  wire                           dwcr0_sel,
>                                 dwcr1_sel;      // DWCR selects
>  reg                            dbg_bp_r;
> +reg                            ex_freeze_q;
>  `ifdef OR1200_DU_HWBKPTS
>  reg    [31:0]                  match_cond0_ct;
>  reg    [31:0]                  match_cond1_ct;
> @@ -529,12 +530,16 @@
>  assign dwcr1_sel = (spr_cs && (spr_addr[`OR1200_DUOFS_BITS] == 
> `OR1200_DU_DWCR1));
>  `endif
>
> +// Track previous ex_freeze to detect when signals are updated
> +always @(posedge clk)
> +  ex_freeze_q <= ex_freeze;
> +
>  //
>  // Decode started exception
>  //
>  // du_except_stop comes from or1200_except
>  //
> -always @(du_except_stop) begin
> +always @(du_except_stop or ex_freeze_q) begin
>         except_stop = 14'b00_0000_0000_0000;
>         casez (du_except_stop)
>                 14'b1?_????_????_????:
> @@ -566,16 +571,16 @@
>                         except_stop[`OR1200_DU_DRR_RE] = 1'b1;
>                 end
>                 14'b00_0000_0000_01??: begin
> -                       except_stop[`OR1200_DU_DRR_TE] = 1'b1;
> +                       except_stop[`OR1200_DU_DRR_TE] = 1'b1 & ~ex_freeze_q;
>                 end
>                 14'b00_0000_0000_001?: begin
>                         except_stop[`OR1200_DU_DRR_FPE] = 1'b1;
>                 end
>                 14'b00_0000_0000_0001:
> -                       except_stop[`OR1200_DU_DRR_SCE] = 1'b1;
> +                       except_stop[`OR1200_DU_DRR_SCE] = 1'b1 & ~ex_freeze_q;
>                 default:
>                         except_stop = 14'b00_0000_0000_0000;
> -       endcase
> +       endcase // casez (du_except_stop)
>  end
>
>  //
> _______________________________________________
> Openrisc mailing list
> [email protected]
> http://lists.opencores.org/listinfo/openrisc


Hi,

I think we should try to not get this patch be lost in the ML. Yann,
you should at least post this on the openrisc forum. I think a lot of
people (including me) observed this anoying bug. I'll try that patch
as soon as I can.

Thanks for your investigation on this.

Franck.
_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc

Re: [OpenRISC] [Openrisc] [RFC patch] OR1200 debug unit failing to continue from some stalls

Reply via email to