RE: [PATCH] MIPS16/GCC: Emit bounds checking as RTL in `casesi'

2017-06-14 Thread Maciej W. Rozycki
On Mon, 12 Jun 2017, Matthew Fortune wrote:

> The only benefit would be if something could fill the load latency between
> obtaining the offset and adding to the base which seems low probability.

 Or if any intermediate results were found to be reusable in other 
calculations nearby (not that I think this is likely though).

> Thanks for the intricate commentary on this patch, makes it very easy to
> review.

 I hoped it would.  Writing a proper description also serves as the last 
point of reflection: "Did I get it right?  Haven't I missed anything?"

 Patch applied now, thank you for you review.

  Maciej


RE: [PATCH] MIPS16/GCC: Emit bounds checking as RTL in `casesi'

2017-06-12 Thread Matthew Fortune
Maciej Rozycki  writes:
>  Further to my changes made last November here is an update to the MIPS16
> `casesi' pattern making it emit bounds checking as RTL rather than having
> it as hardcoded assembly within the `casesi_internal_mips16_'
> dispatcher.  See below for how PR tree-optimization/51513 has prevented me
> from proceeding back then.
> 
>  This new arrangement has several advantages:
> 
> 1. There is no hardcoded BTEQZ branch instruction that has a limited span
>and can overflow causing an assembly failure if the target label is too
>distant.  GCC is able to relax out of range MIPS16 branches these days,
>but obviously they have to be expressed in RTL rather than buried in
>assembly code.  This overflow can be easily reproduced; please enquire
>for a boring example if interested.
> 
> 2. The `casesi_internal_mips16_' pattern now has an accurate length
>(aka instruction count) recorded as all its remaining emitted assembly
>instructions are known in advance to fit in their regular (16-bit)
>machine encodings.  Previously there was uncertainty about the SLTU and
>BTEQZ instructions used for the bounds check, which depending on their
>exact arguments could have each resulted in their either regular
>(16-bit) or extended (32-bit) encoding.  Consequently the worst length
>estimate was recorded instead, possibly causing worse code generation
>(e.g. premature out-of-range branch relaxation or jump table expansion
>from half-words to full words).
> 
> 3. GCC now has freedom to schedule code around bounds checking as it sees
>fit rather than having to adapt to the fixed assembly block.  In fact
>it tends to make use of it right away swapping BTEQZ for the BTNEZ
>instruction and rescheduling code such that the out-of-bounds (default)
>case executes linearly.
> 
> There are probably more benefits, but these are what has immediately come
> to my mind.

Sounds good, I certainly agree that we will see the benefits you list above.

>  As noted above I meant to propose this change along with the rest so as
> to have it in GCC 7, however emitting the bounds check as a separate RTL
> pattern triggered an unrelated bug, then unknown to me, causing a fatal
> code generation regression, and the lack of time did not allow me to
> investigate it further.  This was easily reproduced with a piece of code
> (reduced from actual Linux kernel code) like this:
> 
> $ cat frob.c
> int
> frob (int i)
> {
>   switch (i)
> {
> case -5:
>   return -2;
> case -3:
>   return -1;
> case 0:
>   return 0;
> case 3:
>   return 1;
> case 5:
>   break;
> default:
>   __builtin_unreachable ();
> }
>   return i;
> }
> 
> producing truncated assembly like this:
> 
> $ gcc -O2 -mips16 -mcode-readable=yes -dp -S frob.c
> $ cat frob.s
>   .file   1 "frob.c"
>   .section .mdebug.abi32
>   .previous
>   .nanlegacy
>   .module fp=32
>   .module nooddspreg
>   .abicalls
>   .option pic0
>   .text
>   .align  2
>   .weak   frob
>   .setmips16
>   .setnomicromips
>   .entfrob
>   .type   frob, @function
> frob:
>   .frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
>   .mask   0x,0
>   .fmask  0x,0
>   addiu   $2,$4,5  # 11   *addsi3_mips16/7[length = 2]
>   .endfrob
>   .size   frob, .-frob
>   .ident  "GCC: (GNU) 7.0.0 20161117 (experimental)"
> $
> 
> -- where as you can see the whole switch statement has vanished along with
> any return path from the function, and only the preparatory addition
> emitted from `casesi' with:
> 
>   emit_insn (gen_addsi3 (reg, operands[0], offset));
> 
> remained.
> 
>  The causing bug has turned out to be what was filed as PR
> tree-optimization/51513 and has been kindly fixed by Peter recently
> (thanks, Peter!) with r247844 ("Fix PR51513, switch statement with default
> case containing __builtin_unreachable leads to wild branch"),
> , enabling me to
> proceed with this change without having to investigate the cause of code
> breakage -- which for the MIPS16 target has clearly turned out to be
> graver than a mere silly branch never to be executed.
> 
>  Given the previous troubles with this change I have decided to add
> MIPS16 test cases to verify that code truncation has not happened,
> complementing gcc.target/powerpc/pr51513.c, in case further tweaks in this
> area might do something bad.  This would be caught by
> gcc.target/mips/insn-casesi.c added with r242424, but that test case does
> not refer to PR tree-optimization/51513, so let's make it explicit.  With
> the PR tree-optimization/51513 fix removed the two new cases indeed cause:
> 
> FAIL: gcc.target/mips/pr51513-1.c   -O2   scan-assembler \tjrc?\t\\$31\n
> FAIL: