[Bug target/78660] [7 Regression] 7.0 bootstrap fail on mips64el-unknow-linux: configure-stage2-target-libgcc' failed

2017-01-13 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660

Matthew Fortune  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu.org

--- Comment #9 from Matthew Fortune  ---
@eric: adding you following our discussion on PR rtl-optimization/59461

I have added my findings so far on this ticket.

[Bug target/78660] [7 Regression] 7.0 bootstrap fail on mips64el-unknow-linux: configure-stage2-target-libgcc' failed

2017-01-13 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660

--- Comment #8 from Matthew Fortune  ---
Created attachment 40518
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40518=edit
testcase

I have narrowed this bug down to a mis-compilation of gcc/c/c-decl.c where
there are a few code differences after applying yunqiang's partial revert. A
reduced and pre-processed c-decl.c file is attached.

mips64el-linux-gnu-g++ -fno-PIE -c  -DIN_GCC_FRONTEND  -O2 -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-fno-common  -o c-decl.me.o c-decl.me.c

The key (I think) is that the following sequence of 3 instructions ends up
being combined into 1 but the resulting instruction leaves the upper 32-bits of
reg 316 entirely undefined. Eventually this leads to reg 316 being spilled to
the stack where it is allocated a 64-bit slot but this spill only writes
32-bits whereas consumers read 64-bit and even if the value will only ever be
operated on as 32-bit or less then logical and branch operations on the
reloaded value will go wrong and normal 32-bit operations will be (strictly)
undefined.

(insn 56 55 57 3 (set (reg:SI 470)
(ne:SI (reg/f:DI 469 [ current_scope.1_1->bindings ])
(const_int 0 [0])))
"/althome/mips/v6/src/gcc/gcc/c/c-decl.me.c":915 501 {*sne_zero_disi}
 (expr_list:REG_DEAD (reg/f:DI 469 [ current_scope.1_1->bindings ])
(nil)))
(insn 57 56 58 3 (set (reg:QI 468)
(subreg:QI (reg:SI 470) 0))
"/althome/mips/v6/src/gcc/gcc/c/c-decl.me.c":915 360 {*movqi_internal}
 (expr_list:REG_DEAD (reg:SI 470)
(nil)))
(insn 58 57 59 3 (set (reg:DI 316 [ iftmp.3_114 ])
(zero_extend:DI (reg:QI 468)))
"/althome/mips/v6/src/gcc/gcc/c/c-decl.me.c":915 214 {*zero_extendqidi2}
 (expr_list:REG_DEAD (reg:QI 468)
(nil)))

combines to...

(insn 58 57 59 3 (set (subreg:SI (reg:DI 316 [ iftmp.3_114 ]) 0)
(ne:SI (reg/f:DI 469 [ current_scope.1_1->bindings ])
(const_int 0 [0])))
"/althome/mips/v6/src/gcc/gcc/c/c-decl.me.c":915 501 {*sne_zero_disi}
 (expr_list:REG_DEAD (reg/f:DI 469 [ current_scope.1_1->bindings ])
(nil)))

The relevant fragment of combine is:

Trying 57 -> 58:
Successfully matched this instruction:
(set (reg:DI 316 [ iftmp.3_114 ])
(subreg:DI (reg:SI 470) 0))
allowing combination of insns 57 and 58
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 57.
modifying insn i358: r316:DI=r470:SI#0
  REG_DEAD r470:SI
deferring rescan insn with uid = 58.

Trying 56 -> 58:
Successfully matched this instruction:
(set (subreg:SI (reg:DI 316 [ iftmp.3_114 ]) 0)
(ne:SI (reg/f:DI 469 [ current_scope.1_1->bindings ])
(const_int 0 [0])))
allowing combination of insns 56 and 58
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 56.
modifying insn i358: r316:DI#0=r469:DI!=0
  REG_DEAD r469:DI
deferring rescan insn with uid = 58.

I've not started investigating why exactly this decision is made.

[Bug rtl-optimization/59461] missed zero-extension elimination in the combiner

2017-01-13 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59461

--- Comment #10 from Matthew Fortune  ---
(In reply to Eric Botcazou from comment #9)
> > This is a notoriously hard topic to address. All instructions affect the
> > full 64-bit register including those that do 32-bit arithmetic i.e. they
> > will set/clear the upper bits to replicate bit-31.
> 
> So there are different 32-bit and 64-bit 'add' instructions for example? 
> That might nevertheless be OK for WORD_REGISTER_OPERATIONS.

Yes. But logical operations are the same machine instructions regardless of
whether it is being used for a 64-bit logical op or 32-bit logical op.

> > In terms of instruction definition we therefore have instructions that
> > operate on DImode and instructions that operate on SImode. The SImode
> > instructions just don't need to worry about what is happening with the upper
> > bits.
> 
> Likewise for SPARC, which is WORD_REGISTER_OPERATIONS too, but doesn't care
> about the upper bits in 32-bit mode and has a single 'add' instruction (but
> maintains 32-bit _and_ 64-bit condition codes for every instruction).

So are the upper bits for SPARC completely undefined? That would then be the
major distinction between MIPS and SPARC. The upper bits are defined for MIPS

> > I don't know if any of that subtlety affects this yet.
> 
> Bugs in the nonzero_bits machinery are not to be ruled out either.  I'm
> willing to help of course but I don't have access to MIPS64 hardware.

I'm yet to even narrow down to an object file affected owing to the bug only
showing in stage2 build so far.

Thanks for the offer of help, I'll add more when I find it.

[Bug rtl-optimization/59461] missed zero-extension elimination in the combiner

2017-01-12 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59461

--- Comment #8 from Matthew Fortune  ---
(In reply to Eric Botcazou from comment #7)
> > I'm yet to get my head around what the issue is but if anyone has a pointer
> > based on the potential impact on MIPS64 as described above then I'd be
> > grateful.
> 
> Is WORD_REGISTER_OPERATIONS correct for MIPS64, i.e. do all instructions
> operate on the full 64-bit integer registers?

This is a notoriously hard topic to address. All instructions affect the full
64-bit register including those that do 32-bit arithmetic i.e. they will
set/clear the upper bits to replicate bit-31. However, according to the
architecture they logically operate on 32-bits and require that all inputs are
canonical (sign bit replicated) otherwise the operation is invalid. So it would
not matter whether the register was 33 bits or 1000 bits wide as long as all
bits from 32 upwards replicate bit-31. The upper bits only become relevant once
a 32-bit value is cast to a 64-bit value where sign extension is free and zero
extension is an operation. truncation from 64-bit to 32-bit is a sign extension
from bit-31 regardless of whether it is truncating to signed or unsigned.

In terms of instruction definition we therefore have instructions that operate
on DImode and instructions that operate on SImode. The SImode instructions just
don't need to worry about what is happening with the upper bits.

I don't know if any of that subtlety affects this yet.

[Bug rtl-optimization/59461] missed zero-extension elimination in the combiner

2017-01-12 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59461

Matthew Fortune  changed:

   What|Removed |Added

 CC||matthew.fortune at imgtec dot 
com

--- Comment #6 from Matthew Fortune  ---
(In reply to Eric Botcazou from comment #5)
> Fixed in GCC 7.

There is a reasonable chance that this patch broke mips64 n64 but I do not have
confirmation yet. See PR target/78660.

I can vaguely see how this patch may affect MIPS64 in terms of how 32-bit
values are handled on a MIPS64 architecture with the sign bit replicated to the
upper-32 bits. This presumably is somehow not accounted for in the nonzero_bits
logic.

I'm yet to get my head around what the issue is but if anyone has a pointer
based on the potential impact on MIPS64 as described above then I'd be
grateful.

[Bug tree-optimization/15826] don't use "if" to extract a single bit bit-field.

2016-12-20 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15826

Matthew Fortune  changed:

   What|Removed |Added

 CC||matthew.fortune at imgtec dot 
com

--- Comment #17 from Matthew Fortune  ---
(In reply to Bill Schmidt from comment #15)
> ;; Function andrew (andrew, funcdef_no=2, decl_uid=2361, cgraph_uid=2,
> symbol_order=2)
> 
> andrew (struct s * p)
> {
>   unsigned int _4;
>   unsigned int _5;
>   unsigned int _6;
> 
>   :
>   _4 = BIT_FIELD_REF <*p_3(D), 32, 0>;
>   _5 = _4 & 1;
>   _6 = _5;
>   return _6;
> 
> }

MIPS sees this same spurious failure. The issue (on a GCC 6 branch) is that the
optimization has successfully happened it just looks like the check is wrong in
the testcase. The presence of a logical and is not harmful as it is a
legitimate way to truncate: (from match.pd)

/* A truncation to an unsigned type (a zero-extension) should be
   canonicalized as bitwise and of a mask.  */

This simplification kicks in for MIPS and presumably powerpc but does not (I
don't know why) for x86_64. There is no 'goto' demonstrating that a single bit
extract does not use an IF.

I therefore propose we just update the check to verify there is no 'goto'.

Matthew

[Bug target/78660] [7 Regression] 7.0 bootstrap fail on mips64el-unknow-linux: configure-stage2-target-libgcc' failed

2016-12-20 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660

--- Comment #3 from Matthew Fortune  ---
(In reply to Aldy Hernandez from comment #2)
> I can't reproduce on a cross build.  Is there a mips64el box on the compile
> farm or somewhere public so someone can look at this?

The following machines were added to the farm relatively recently. gcc24 is not
accepting my login currently though to gcc22 or gcc23 are usable. They will not
be fast though as they are modified routers. They are 32-bit but with 64-bit
multilibs I suspect that is good enough to reproduce. I won't find time to look
at this until the new year at least, any help is appreciated.

Operating system are currently configured as follows:
* gcc22 - kernel 3.14.10 (EB), Debian 7.10 (Wheezy), gcc 4.6.3
* gcc23 - kernel 4.1.4 (EL), Debian 8.4 (Jessie), gcc 4.9.2
* gcc24 - kernel 4.1.4 (EL), Debian 8.4 (Jessie), gcc 4.9.2 (EB = big endian,
EL = little endian)

[Bug target/78176] [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-04 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

--- Comment #9 from Matthew Fortune  ---
(In reply to Eric Botcazou from comment #8)
> > The expansion looks like an acceptable transformation to me i.e. it is not
> > introducing the overflow for the offending pointer just maintaining what is
> > already in the tree.
> 
> Wrap around for unsigned types is OK but, if expansion does implicit
> extensions to larger types, then things can easily go wrong.

Sure, and in this case there are no implicit extensions to larger types. The
bug does require an implicit extension to occur but this only happens at
runtime when on 64-bit hardware and the core is not configured to be limited to
a 32-bit address space which is always true for a 64-bit kernel as it turns
out.

> > I'm still not sure if there is really a bug. Should reassoc not be doing
> > this for 'sizetype'? Should ivopts not have created the mess in the first
> > place? Would changing either of these actually introduce an assurance that
> > this situation won't occur from other optimisations?
> 
> sizetype is unsigned so all the transformations looks valid.

I'm leaning heavily towards not-a-compiler-bug but may need to offer a
workaround until we can solve this in the Linux kernel.

The only workaround possible is to provide an option to disable register+index
addressing i.e. remove [ls][wd]xc1 instructions.

[Bug target/78176] [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-04 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

--- Comment #7 from Matthew Fortune  ---
(In reply to Eric Botcazou from comment #6)
> > The issue may stem from the C front end where the dumps start off as below.
> > Note that the '-1' in kappa-1 has ended up being represented as 1073741823
> > (0x3fff) presumably owing to the fact it will be multiplied by 4
> > eventually and hence be 'safe'.
> 
> All pointer calculations are done in sizetype and it is unsigned.  This kind
> of issues generally come from the RTL level, maybe expansion here.

The expansion looks like an acceptable transformation to me i.e. it is not
introducing the overflow for the offending pointer just maintaining what is
already in the tree.

I dug back to reassoc2 based on Andrew's comment which gets the following as
input:

  _48 = (sizetype) kappa_6(D);
  _49 = _48 * 8;
  _50 = s_37(D) + _49;
  ivtmp.19_47 = (unsigned int) _50

and then in the loop:

  _54 = (sizetype) s_37(D);
  _55 = ivtmp.19_3 - _54;
  _56 = _55 + 4294967280;
  # RANGE [0, 4294967295] NONZERO 4294967288
  _19 = _56;
  # PT = nonlocal escaped 
  _20 = _17 + _19;
  # VUSE <.MEM_4>
  _21 = *_20

This includes a pointer subtraction of 's' (somewhat pointless but valid I
believe).

reassoc2 changes it to this:

  _48 = (sizetype) kappa_6(D);
  _49 = _48 * 8;
  _50 = s_37(D) + _49;
  ivtmp.19_47 = (unsigned int) _50;

and in the loop:

  _54 = (sizetype) s_37(D);
  _38 = -_54;
  _25 = 4294967280 - _54;
  _56 = _25 + ivtmp.19_3;
  # RANGE [0, 4294967295] NONZERO 4294967288
  _19 = _56;
  # PT = nonlocal escaped 
  _20 = _17 + _19;
  # VUSE <.MEM_4>
  _21 = *_20;

And _56 is where we get the overflow.

I'm still not sure if there is really a bug. Should reassoc not be doing this
for 'sizetype'? Should ivopts not have created the mess in the first place?
Would changing either of these actually introduce an assurance that this
situation won't occur from other optimisations?

[Bug target/78176] [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-03 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

Matthew Fortune  changed:

   What|Removed |Added

 CC||matthew.fortune at imgtec dot 
com

--- Comment #5 from Matthew Fortune  ---
I'm struggling to see what can be fixed in GCC for this (and I'm not sure I
follow why there is a problem with GCC's behaviour) but...

The issue may stem from the C front end where the dumps start off as below.
Note that the '-1' in kappa-1 has ended up being represented as 1073741823
(0x3fff) presumably owing to the fact it will be multiplied by 4 eventually
and hence be 'safe'. I can't figure out so far what creates this value but the
behavior started some time between gcc 4.7 and 4.8 I believe. The questionably
bad code then shows up dependent on optimisations and instruction availability.
If the calculation had been done with ADDU instead of in the LDXC1 then all
would be well.

Regardless of whether this is a real bug for the compiler then the Linux kernel
should be making use of the status.UX bit in order to properly emulate a 32-bit
environment for o32 on 64-bit kernel/hardware and had this been run using a
32-bit kernel (on 64-bit hardware) then it would run correctly. There is an
issue for N32 code with the UX bit in that you also need the PX bit available
to independently control 64-bit instruction availability vs address space but
this is only missing in MIPSIV cores.

;; Function ldxc1_test (null)
;; enabled by -tree-original


{
  int kappa2 = kappa;
  double tmp = 0.0;

int kappa2 = kappa;
double tmp = 0.0;
  :;
  kappa-- ;
  if (zeros + 1 < kappa)
{
  tmp = *(*(r->rows + ((sizetype) kappa + 1073741823) * 4) + ((sizetype)
kappa + 536870911) * 8) * ctt;
  tmp = ldexp (tmp, *(expo + ((sizetype) kappa + 1073741823) * 4) - *(expo
+ (sizetype) ((unsigned int) kappa2 * 4)));
}
  if (zeros + 1 < kappa && *(s + ((sizetype) kappa + 536870911) * 8) <= tmp)
goto ; else goto ;
  :;
  return tmp;

[Bug java/71917] [7 regression] libjava.jar/ReturnProxyTest.jar etc. FAIL

2016-09-09 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71917

--- Comment #12 from Matthew Fortune  ---
Created attachment 39593
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39593=edit
Proposed fix

Attached fix should resolve the issue on sparc64 BE.

The original attempt at the fix for mips64el is reverted and instead the rvalue
conversion for the native FFI interface now copes with 64-bit little endian
targets ensuring sign/zero extension is performed appropriately. The fix is for
all architectures but presumably others do not suffer if the sign/zero
extension is missing (or have not noticed).

[Bug java/71917] [7 regression] libjava.jar/ReturnProxyTest.jar etc. FAIL

2016-08-30 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71917

--- Comment #10 from Matthew Fortune  ---
(In reply to Eric Botcazou from comment #9)
> > I'll certainly check on this but I did run the fix on both big and little
> > endian MIPS which seems to suggest there isn't a double adjustment overall.
> 
> So this was broken in 64-bit big-endian too before your fix?

Ah, I did BE/LE 32-bit and LE 64-bit. The bug was seen on LE 64-bit.

I'm getting my head around this now! The conversion functions will need to
perform the 32-bit to 64-bit sign extension but do nothing for the reverse. I
think this means that only the raw-to-rvalue needs updating based on the
function names.

I'll give it a go.

[Bug java/71917] [7 regression] libjava.jar/ReturnProxyTest.jar etc. FAIL

2016-08-30 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71917

--- Comment #8 from Matthew Fortune  ---
(In reply to Eric Botcazou from comment #7)
> > 2016-07-13  Matthew Fortune  
> > 
> > * interpret-run.cc: Use ffi_arg for FFI integer return types.
>
> so we now have a double adjustment on 64-bit big-endian and this breaks. 
> Maybe there is something missing for little-endian MIPS in java_raw_api.c.

I'll certainly check on this but I did run the fix on both big and little
endian MIPS which seems to suggest there isn't a double adjustment overall. 
The layers involved in this have however exceeded what I can fit in my head so
I'll have to draw things out to try and understand what was/should-be/is
happening.

Is there any definition for the raw API? I've seen some reference to arguments
being passed 'in the way a java stack works' but am not sure how that relates
to FFI return values passed in registers.

[Bug target/68273] [5/6/7 Regression] Wrong code on mips/mipsel due to (invalid?) peeking at alignments in function_arg.

2016-08-05 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68273

--- Comment #37 from Matthew Fortune  ---
(In reply to rguent...@suse.de from comment #36)
> On Fri, 3 Jun 2016, matthew.fortune at imgtec dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68273
> > 
> > --- Comment #35 from Matthew Fortune  ---
> > (In reply to Aurelien Jarno from comment #33)
> > > (In reply to Hector Oron from comment #32)
> > > > (In reply to Richard Biener from comment #31)
> > > > > eipa_sra introduces the remaining SSA name with non-default alignment 
> > > > > via
> > > > 
> > > > [PATCH]
> > > > 
> > > 
> > > For the record, Debian now have the patch from comment #32 in its version 
> > > of
> > > GCC 5 and 6. This fixes a few more issues than reported here, in addition 
> > > to
> > > graphviz, it also fixes subversion and jq.
> > 
> > Richard: Is the updated patch from comment #32 in-keeping with your original
> > changes in comment #20? If so then I'll post it on gcc-patches.
> > 
> > I do accept that there are MIPS fixes to be made in this area but I'd like 
> > to
> > get the existing code to 'work' again first if possible.
> 
> Note that the tree-ssa.c changes from comment#32 do not work as-is
> (on the branch you don't notice that because checking is disabled).
> Even the relaxed variant that we don't have over-aligned types in the IL
> breaks AFAIR.
> 
> Given the analysis in comment #31 I'd rather patch up either
> turn_representatives_into_adjustments or ipa_modify_formal_parameters
> (which already seems to have code to avoid under-aligned types).
> 
> That is, does
> 
> Index: gcc/ipa-prop.c
> ===
> --- gcc/ipa-prop.c  (revision 237053)
> +++ gcc/ipa-prop.c  (working copy)
> @@ -3910,7 +3909,7 @@ ipa_modify_formal_parameters (tree fndec
>   if (is_gimple_reg_type (ptype))
> {
>   unsigned malign = GET_MODE_ALIGNMENT (TYPE_MODE 
> (ptype));
> - if (TYPE_ALIGN (ptype) < malign)
> + if (TYPE_ALIGN (ptype) != malign)
> ptype = build_aligned_type (ptype, malign);
> }
> }
> 
> fix the issue as well?

Sorry for the very slow reply, got busy with libjava bugs.

This does fix jq as well so it looks good. I haven't done a GCC testsuite run
with the change in place yet though.

[Bug java/71917] [7 regression] libjava.jar/ReturnProxyTest.jar etc. FAIL

2016-07-18 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71917

Matthew Fortune  changed:

   What|Removed |Added

 CC||matthew.fortune at imgtec dot 
com

--- Comment #2 from Matthew Fortune  ---
Hi Rainer,

Sorry for the bugs, I am however tempted to say this will turn out to be a
libffi bug. You have described the same failure mode as I fixed for MIPS but
libjava is now following the ffi return type rules so my assumption is that it
is sparc ffi that is not. I've had a quick read of sparc ffi code and it seems
there is a chance that integer return types are not being promoted to word
size. I.e. following the ffi rule that integers smaller than a word are
returned as type ffi_arg.

Do you know if the two new testcases fail if built against a gij built without
my changes to java/lang/reflect/natVMProxy.cc and interpret-run.cc? I suspect
they may actually pass before but for the wrong reasons.

Does 64-bit sparc have problems with other codebases using ffi? libguile,
python being some notable examples?

Matthew

[Bug target/68273] [5/6/7 Regression] Wrong code on mips/mipsel due to (invalid?) peeking at alignments in function_arg.

2016-06-03 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68273

--- Comment #35 from Matthew Fortune  ---
(In reply to Aurelien Jarno from comment #33)
> (In reply to Hector Oron from comment #32)
> > (In reply to Richard Biener from comment #31)
> > > eipa_sra introduces the remaining SSA name with non-default alignment via
> > 
> > [PATCH]
> > 
> 
> For the record, Debian now have the patch from comment #32 in its version of
> GCC 5 and 6. This fixes a few more issues than reported here, in addition to
> graphviz, it also fixes subversion and jq.

Richard: Is the updated patch from comment #32 in-keeping with your original
changes in comment #20? If so then I'll post it on gcc-patches.

I do accept that there are MIPS fixes to be made in this area but I'd like to
get the existing code to 'work' again first if possible.

[Bug rtl-optimization/65862] [MIPS] IRA/LRA issue: integers spilled to floating-point registers

2015-05-06 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65862

--- Comment #6 from Matthew Fortune matthew.fortune at imgtec dot com ---
(In reply to Robert Suchanek from comment #5)
  I am not sure, that the result code is better as we access memory 3
  times instead of access to $f20.
 
 On one hand, yes, it seems good but it's not always desirable to use FP regs
 until absolutely necessary. For instance, compiling the dynamic linker that
 uses FP regs does not seem to be right.

On the costs front then while it is true that moves between FPR and GPR are
usually cheaper than moving to memory and back there is a secondary cost which
is that simply turning on the FPU is costly. So, the reason for using FPRs
needs to be that the floating point instructions are used rather than
registers. Ideally we would not spill to FPRs unless there has been actual
floating point code used, this suggests it would be good to set the cost of
GPR-FPR higher than memory if no floating point code is present in the
function. Otherwise if the FPU is in use anyway then using FPRs as
scratch/spill for integer mode data is fine and the costs can be lower than
memory. 

Matthew


[Bug target/64569] New: [MIPS] Unable to build soft-float in conjunction with binutils 2.25

2015-01-12 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64569

Bug ID: 64569
   Summary: [MIPS] Unable to build soft-float in conjunction with
binutils 2.25
   Product: gcc
   Version: 4.9.3
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthew.fortune at imgtec dot com

binutils 2.25 for MIPS includes more aggressive checks on ABI usage and in
particular warns when given code that claims to be soft-float via
.gnu_attribute 4,3 but is assembled without -msoft-float. binutils 2.25 will
also infer an ABI from the floating-point related command line options.

No GCC compiler drivers prior to GCC 5.0 pass the -msoft-float option through
to the assembler which means that the assembler is given an inconsistent view
of the intended ABI.

For compiled code the issue will show itself as a warning from the assembler:
Warning: .gnu_attribute 4,3 requires `softfloat'

For hand-crafted assembler modules assembled via the compiler driver then the
ABI inferred by the assembler will be hard-float instead of soft-float owing to
the missing -msoft-float option.

A combination of these two problems will then lead to a link failure stating
that incompatible ABIs are in use. A number of other relocation errors can also
occur as a side effect.

(This bug was also filed as a binutils issue but is actually a compiler driver
issue: https://sourceware.org/bugzilla/show_bug.cgi?id=17219)


[Bug target/63848] [5.0 regression] FAIL: c-c++-common/torture/builtin-arith-overflow-17.c -O0 execution test

2014-11-14 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63848

Matthew Fortune matthew.fortune at imgtec dot com changed:

   What|Removed |Added

 CC||matthew.fortune at imgtec dot 
com

--- Comment #1 from Matthew Fortune matthew.fortune at imgtec dot com ---
I am seeing multiple failures for the builtin-arith-overflow tests on MIPS64 as
well. I have not yet had time to investigate much but it may be due to some
missing sign extension and/or performing a range test using bitwise comparison.


[Bug target/51729] dspr2-MULT.c and dspr2-MULTU.c fail for MIPS

2014-07-31 Thread matthew.fortune at imgtec dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51729

Matthew Fortune matthew.fortune at imgtec dot com changed:

   What|Removed |Added

 CC||clm at codesourcery dot com,
   ||matthew.fortune at imgtec dot 
com,
   ||sandra at codesourcery dot com

--- Comment #3 from Matthew Fortune matthew.fortune at imgtec dot com ---
The affected tests appear to have started passing again since this commit:

svn: r211959

2014-06-24  Catherine Moore  c...@codesourcery.com
Sandra Loosemore  san...@codesourcery.com

gcc/
* config/mips/mips.c (mips_order_regs_for_local_alloc): Delete.
* config/mips/mips.h (ADJUST_REG_ALLOC_ORDER): Delete.
* config/mips/mips-protos.h (mips_order_regs_for_local_alloc): Delete.

Irritatingly they now pass for -O2 and -O3 but still fail for -O1 and -Os. I
was half tempted to submit a patch to just be content that we get multiple
accumulators now used at -O2 and -O3 and leave it at that but I think that is
probably the wrong attitude.

Has anyone else noticed the change in state for these tests and looked into it,
planning to look into it or have any thoughts? I'll take a look when I get
chance.


[Bug rtl-optimization/58461] [MIPS] Using LRA instead of reload increases code size for mips16

2013-09-25 Thread matthew.fortune at imgtec dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58461

--- Comment #3 from Matthew Fortune matthew.fortune at imgtec dot com ---
(In reply to rsand...@gcc.gnu.org from comment #2)
 I think it'd be wrong for the backend to say that moves between
 MIPS16 registers and other general registers are more expensive
 than memory though.  

I'm not surprised by that comment. I agree. It was just an option I found that
avoided the need for changes to IRA.

 But of course, although the moves are cheap, registers other than
 $2-$7, $16, $17 and $24 are only useful as spill space.  So if
 it's better for IRA to ignore them and LRA to use them via
 TARGET_SPILL_CLASS, perhaps we should enforce that directly,
 by hiding other registers from IRA.  I suppose that's like
 restoring the old cover classes hook, but only as an optional
 feature.

One of the problems here is that IRA is coming up with M16_REGS as the
preferred class and GR_REGS as the alternate class for most pseudos. If there
is no cover class to translate GR_REGS into then that seems like a problem
unless it would work to use M16_REGS as a cover class for GR_REGS.
My initial attempt at this problem was to stop IRA from using the superunion of
preferred and alternate classes for an allocno and instead just using the
preferred class. This has the same effect as the change to register move costs
which was to get the alternate class to be NO_REGS (or ignored).

 I realise this isn't the point of the bug report or the attachment,
 but just FWIW: the constraints shouldn't be matching the fake
 FRAME_POINTER_REGNUM.  They should wait for it to be eliminated
 to either STACK_POINTER_REGNUM or HARD_FRAME_POINTER_REGNUM and
 match that.

I'll look at that when putting an 'enable LRA' patch together. There's lots of
testing to do yet.


[Bug rtl-optimization/58461] [MIPS] Using LRA instead of reload increases code size for mips16

2013-09-18 Thread matthew.fortune at imgtec dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58461

--- Comment #1 from Matthew Fortune matthew.fortune at imgtec dot com ---
Created attachment 30853
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30853action=edit
Patch to enable LRA for mips16


[Bug rtl-optimization/58461] New: [MIPS] Using LRA instead of reload increases code size for mips16

2013-09-18 Thread matthew.fortune at imgtec dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58461

Bug ID: 58461
   Summary: [MIPS] Using LRA instead of reload increases code size
for mips16
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matthew.fortune at imgtec dot com

Created attachment 30852
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30852action=edit
Test case to trigger LRA reload issue

While working on enabling LRA for MIPS (mips16 in particular) I have observed
that LRA is not producing as optimal code as classic reload. The underlying
cause of this is that the register allocation decisions made in IRA were
sub-optimal but classic reload 'fixes' them whereas LRA does not. Regardless of
fixing the IRA issues there is probably something to fix in LRA.

I have attached a patch that applies to top of tree to enable LRA for
mips/mips16 and exposes two options to demonstrate how LRA differs from classic
reload. I have also attached a test case (reload_test_mips16.i) which is a
function from libjpeg.

The two options added by the patch are -mreload and -mfix-regalloc. LRA is
default on with the patch applied:

* mips-sde-elf-gcc -Os -mips16 -march=m4kec reload_test_mips16.i

LRA introduces a number of reloads that involve $24 which is inaccessible to
most mips16 instructions leading to an increase in code size. 

* mips-sde-elf-gcc -Os -mips16 -march=m4kec -mreload ...

Classic reload manages to avoid the reloads of $24 as its reloads converge to
use the same reload register and eliminate $24 altogether.

* mips-sde-elf-gcc -Os -mips16 -march=m4kec -mfix-regalloc ...

LRA now outperforms classic reload as the initial register allocation by IRA is
better so LRA does not hit the problem I am reporting.

The original register allocation from IRA is:
Disposition:
   26:r245 l0 22:r246 l0 4   28:r249 l0 23:r250 l016
4:r251 l0175:r252 l0 86:r253 l0 9   19:r260 l011
   15:r266 l024   12:r275 l0 3   11:r276 l0 2   10:r278 l010
   27:r281 l0 47:r282 l0 58:r283 l0 61:r284 l0 7
   29:r285 l0   mem   24:r286 l024   25:r287 l0 2   23:r288 l024
   22:r289 l024   21:r290 l024   20:r291 l024   18:r292 l024
   17:r293 l011   16:r294 l011   14:r295 l011   13:r296 l024
9:r297 l0240:r298 l024

The fixed register allocation from IRA is as follows, note the mems instead of
hard registers 8-11,24:
Disposition:
   26:r245 l0 22:r246 l0 4   28:r249 l0 23:r250 l0   mem
4:r251 l0   mem5:r252 l0166:r253 l017   19:r260 l0   mem
   15:r266 l0   mem   12:r275 l0 3   11:r276 l0 2   10:r278 l0   mem
   27:r281 l0 47:r282 l0 58:r283 l0 61:r284 l0 7
   29:r285 l0   mem   24:r286 l0   mem   25:r287 l0 2   23:r288 l0   mem
   22:r289 l0   mem   21:r290 l0   mem   20:r291 l0   mem   18:r292 l0   mem
   17:r293 l0   mem   16:r294 l0   mem   14:r295 l0   mem   13:r296 l0   mem
9:r297 l0240:r298 l024

So the issue (I believe) is that reloads from LRA do not converge as well as
reloads introduced by classic reload. While this example can (and should) be
fixed in the original register allocation it feels as though there is a problem
to fix in LRA.

==

[mfortune@mfortune-linux lra_bugreport]$ /althome/mips/tk/bin/mips-sde-elf-gcc
-v
Using built-in specs.
COLLECT_GCC=/althome/mips/tk/bin/mips-sde-elf-gcc
COLLECT_LTO_WRAPPER=/althome/mips/tk/libexec/gcc/mips-sde-elf/4.9.0/lto-wrapper
Target: mips-sde-elf
Configured with: /althome/mips/git_br/gcc/configure --prefix=/althome/mips/tk
--target=mips-sde-elf --with-gnu-as --with-gnu-ld --with-arch=mips32r2
--with-mips-plt --with-synci --with-llsc --with-newlib
target_alias=mips-sde-elf --enable-languages=c,c++,lto
Thread model: single
gcc version 4.9.0 20130918 (experimental) (GCC)