Re: Zero extractions and zero extends

2010-02-11 Thread Adam Nemet
Jean Christophe Beyler  writes:
> typedef struct sTestUnsignedChar {
> uint64_t a:1;
> }STestUnsignedChar;
>
> uint64_t getU (STestUnsignedChar a)
> {
> return a.a;
> }
>
>
> I get this in the DCE pass :
> (insn 6 3 7 2 bitfield2.c:8 (set (subreg:DI (reg:QI 75) 0)
> (zero_extract:DI (reg/v:DI 73 [ a ])
> (const_int 1 [0x1])
> (const_int 0 [0x0]))) 63 {extzvdi} (expr_list:REG_DEAD
> (reg/v:DI 73 [ a ])
> (nil)))
>
> (insn 7 6 12 2 bitfield2.c:8 (set (reg:DI 74)
> (zero_extend:DI (reg:QI 75))) 51 {zero_extendqidi2}
> (expr_list:REG_DEAD (reg:QI 75)
> (nil)))
>
>
> (on the x86 port, I get a and instead of the zero_extract)
>
> However, on the combine pass both stay, whereas in the x86 port, the
> zero_extend is removed. Where is this decided exactly ?
> I've checked the costs of the instructions, I have the same thing as
> the x86 port.

Maybe it is turned into an (and:DI .. (const_int 1)) and you don't
recognize that?  Check your combine dump file, that should tell you what
is the pattern that combine came up with while dealing with these two
insns.

Adam


Re: gen_lowpart called where 'truncate' needed?

2010-02-04 Thread Adam Nemet
Mat Hostetter writes:
> Adam Nemet  writes:
> 
> > Ian Lance Taylor  writes:
> > > Mat Hostetter  writes:
> > >
> > >> Since the high bits are already zero, that would be less efficient on
> > >> most platforms, so guarding it with something like this would probably
> > >> be smarter:
> > >>
> > >>   if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
> > >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
> > >>
> > >> I'm happy to believe I'm doing something wrong in my back end, but I'm
> > >> not sure what that would be.  I could also believe these are obscure
> > >> edge cases no one cared about before.  Any tips would be appreciated.
> > >
> > > Interesting.  I think you are in obscure edge case territory.  Your
> > > suggestion makes sense to me, and in fact it should probably be put
> > > into gen_lowpart_common.
> > 
> > FWIW, I disagree.  Firstly, mode_rep_extended is a special case of
> > !TRULY_NOOP_TRUNCATION so the above check should use that.  Secondly, in
> > MIPS we call gen_lowpart to convert DI to SI when we know it's safe.  In
> > this case we always want a subreg not a truncate for better code.  So I
> > don't think gen_lowpart_common is the right place to fix this.
> >
> > I think the right fix is to call convert_to_mode or convert_move in the
> > expansion code which ensure the proper truncation.
> 
> That would yield correct code, but wouldn't it throw away the fact
> that the high bits are already known to be zero, and yield redundant
> zero-extension on some platforms?  I'm guessing that's why the code was
> originally written to call convert_lowpart rather than convert_to_mode.

convert_to_mode uses gen_lowpart for truncation if TRULY_NOOP_TRUNCATION.

> And just to add a minor wrinkle: for the widen_bswap case, which
> produces (bswap64 (x) >> 32), the optimal thing is actually to use
> ashiftrt instead of lshiftrt when targetm.mode_rep_extended says
> SIGN_EXTEND, and then call convert_lowpart as now.

Agreed but we sort of do this due to this pattern in mips.md:

(define_insn "*lshr32_trunc"
  [(set (match_operand:SUBDI 0 "register_operand" "=d")
(truncate:SUBDI
  (lshiftrt:DI (match_operand:DI 1 "register_operand" "d")
   (const_int 32]
  "TARGET_64BIT && !TARGET_MIPS16"
  "dsra\t%0,%1,32"
  [(set_attr "type" "shift")
   (set_attr "mode" "")])


That said with mode_rep_extended this optimization could now be performed by
the middle-end in simplify-rtx.c:

(truncate:SI (lshiftrt:DI .. 32))
->
(subreg:SI (ashiftrt:DI .. 32) )

if targetm.mode_rep_extended (SImod, DImode) == SIGN_EXTEND.

Adam


Re: gen_lowpart called where 'truncate' needed?

2010-02-03 Thread Adam Nemet
Ian Lance Taylor  writes:
> Mat Hostetter  writes:
>
>> Since the high bits are already zero, that would be less efficient on
>> most platforms, so guarding it with something like this would probably
>> be smarter:
>>
>>   if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
>> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
>>
>> I'm happy to believe I'm doing something wrong in my back end, but I'm
>> not sure what that would be.  I could also believe these are obscure
>> edge cases no one cared about before.  Any tips would be appreciated.
>
> Interesting.  I think you are in obscure edge case territory.  Your
> suggestion makes sense to me, and in fact it should probably be put
> into gen_lowpart_common.

FWIW, I disagree.  Firstly, mode_rep_extended is a special case of
!TRULY_NOOP_TRUNCATION so the above check should use that.  Secondly, in
MIPS we call gen_lowpart to convert DI to SI when we know it's safe.  In
this case we always want a subreg not a truncate for better code.  So I
don't think gen_lowpart_common is the right place to fix this.

I think the right fix is to call convert_to_mode or convert_move in the
expansion code which ensure the proper truncation.

Adam


Re: GCC-How does the coding style affect the insv pattern recognization?

2010-02-01 Thread Adam Nemet
fanqifei writes:
> 2010/1/18 Adam Nemet :
> > Sorry for jumping in late.  See make_file_assigment in combine.c.
> >
> > The problem usually is that:
> >
> >  (set A (ior (and B C1) OTHER))
> >
> > can only be turned into a bit-insertion if A and B happen to be the same
> > pseudos.
> >
> > Adam
> >
> I did found such kind of pattern for some simple C statements in rtl dump.
> Unfortunately, A and B are not same.
> Is it possible and easy to move B to A firstly and replace B with A in the 
> insn?
> Anyway, this is not required if it's impracticable.

It might work.  You could experiment with a define_split to do that.  Note
however that in terms of combine's notion of profitability, you will need to
have at the end eliminated some instructions.  (Also the overall cost of the
the new insns should be lower than the old ones.)

Alternatively you can recognize the above pattern, use a contraint to force
the pseudos to be allocated to the same hard register.  If successful you can
generate bit-inseration otherwise you split the pattern into the original
insns after reload.

Adam


Re: GCC-How does the coding style affect the insv pattern recognization?

2010-01-17 Thread Adam Nemet
fanqifei  writes:
> Paolo Bonzini said that insv instruction might be synthesized
> later by combine. But combine only works on at most 3 instructions and
> insv is not generated in such case.
> So exactly when will the insv pattern be recognized and how does
> the coding style affect it?

Sorry for jumping in late.  See make_file_assigment in combine.c.

The problem usually is that:

 (set A (ior (and B C1) OTHER))

can only be turned into a bit-insertion if A and B happen to be the same
pseudos.

Adam


Re: LTO and the inlining of functions only called once.

2009-10-13 Thread Adam Nemet
Paul Brook  writes:

>> Nothing you've said changes  the fact that there are a class of users
>> for whom that information is vital and we ought to spend some time
>> thinking about how to provide the information in a form they can
>> digest.  GCC dumps as they exist today are largely useless for a non-GCC
>> developer to use to understand why a particular transformation did or
>> did not occur in their code.  This has come up time and time again and
>> will continue to do so unless we find a way to provide visibility into
>> the optimizer's decision making.
>
> My guess is anyone inspecting the code and optimizer decisions at this level 
> is also going to want to strongarm the result they want when the compiler 
> makes the "wrong" decision. i.e. detailed unroller diagnostics are only of 
> limited use without (say) #pragma unroll.

We would also increase the chances of getting more precise bug reports
rather than "my code is slower with GCC 4.5 than it was with GCC 4.4".
IOW, we could push some of the initial investigation work over to the
user :).

Also with VTA we will hopefully be in better shape referencing
source-level constructs.

Adam


Re: [LTO] Request for testing: Last merge from trunk before final merge

2009-09-30 Thread Adam Nemet
Diego Novillo  writes:
> If anyone has free cycles I would appreciate results from other
> ELF-capable targets.

The branch on mipsisa64-elf looks good (no regressions with languages
c,c++,objc):

  http://gcc.gnu.org/ml/gcc-testresults/2009-09/msg02717.html

baseline:

  http://gcc.gnu.org/ml/gcc-testresults/2009-09/msg02777.html

In addition to what has already been reported we have these failures in
the new lto tests:

FAIL: g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o-cp_lto_20081109-1_0.o link
FAIL: g++.dg/lto/20081118 cp_lto_20081118_0.o-cp_lto_20081118_1.o link
FAIL: g++.dg/lto/20081119-1 cp_lto_20081119-1_0.o-cp_lto_20081119-1_1.o link
FAIL: g++.dg/lto/20081120-1 cp_lto_20081120-1_0.o-cp_lto_20081120-1_1.o link
FAIL: g++.dg/lto/20081120-2 cp_lto_20081120-2_0.o-cp_lto_20081120-2_1.o link
FAIL: g++.dg/lto/20081123 cp_lto_20081123_0.o-cp_lto_20081123_1.o link
FAIL: g++.dg/lto/20081204-1 cp_lto_20081204-1_0.o-cp_lto_20081204-1_1.o link
FAIL: g++.dg/lto/20081219 cp_lto_20081219_0.o-cp_lto_20081219_1.o link
FAIL: g++.dg/lto/20090302 cp_lto_20090302_0.o-cp_lto_20090302_1.o link
FAIL: g++.dg/lto/20090313 cp_lto_20090313_0.o-cp_lto_20090313_1.o link
FAIL: gcc.dg/lto/20081120-1 c_lto_20081120-1_0.o-c_lto_20081120-1_1.o link
FAIL: gcc.dg/lto/20081120-2 c_lto_20081120-2_0.o-c_lto_20081120-2_1.o link
FAIL: gcc.dg/lto/20081204-1 c_lto_20081204-1_0.o-c_lto_20081204-1_1.o link
FAIL: gcc.dg/lto/20081212-1 c_lto_20081212-1_0.o-c_lto_20081212-1_0.o link

  /home/anemet/src/gcc-lto/mipsisa64-elf/gcc/../ld/ld-new:
  /home/anemet/src/gcc-lto/mipsisa64-elf/mipsisa64-elf/./libgloss/mips/crt0.o:
  relocation R_MIPS_HI16 against `hardware_hazard_hook' can not be used
  when making a shared object; recompile with -fPIC

FAIL: g++.dg/lto/20081109 cp_lto_20081109_0.o-cp_lto_20081109_1.o execute -O0 
-fwhopr

  terminate called after throwing an instance of 'int'

FAIL: g++.dg/lto/20081109 cp_lto_20081109_0.o-cp_lto_20081109_1.o link,  
(internal compiler error)
FAIL: g++.dg/lto/20090311-1 cp_lto_20090311-1_0.o-cp_lto_20090311-1_1.o link,  
(internal compiler error)
FAIL: gcc.dg/lto/20081112 c_lto_20081112_0.o-c_lto_20081112_1.o link,  
(internal compiler error)
FAIL: gcc.dg/lto/20081118 c_lto_20081118_0.o-c_lto_20081118_2.o link,  
(internal compiler error)

  lto1: internal compiler error: Segmentation fault
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See  for instructions.

FAIL: g++.dg/lto/20090311 cp_lto_20090311_0.o-cp_lto_20090311_1.o link
FAIL: g++.dg/lto/20090311 cp_lto_20090311_0.o-cp_lto_20090311_1.o link,  
(internal compiler error)
FAIL: g++.dg/lto/20090106 cp_lto_20090106_0.o-cp_lto_20090106_0.o link
FAIL: g++.dg/lto/20090106 cp_lto_20090106_0.o-cp_lto_20090106_0.o link

  (It's strange that these have the same name.)

  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xbc): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND1Ev.1183.1258'
  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xc4): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND0Ev.1236.1255'
  collect2: ld returned 1 exit status
  compiler exited with status 1
  output is:
  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xbc): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND1Ev.1183.1258'
  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xc4): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND0Ev.1236.1255'
  collect2: ld returned 1 exit status


Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Adam Nemet
Richard Henderson writes:
> On 09/01/2009 12:48 PM, Adam Nemet wrote:
> > I see.  So I guess you're saying that there is little chance to optimize the
> > loop I had in my previous email ;(.
> 
> Not at the rtl level.  Gimple-level loop splitting should do it though.
> 
> > Now suppose we split late, shouldn't we still assume that data-flow can 
> > change
> > later.  IOW, wouldn't we be required to use the literal/lituse counting that
> > alpha does?
> 
> If you split post-reload, data flow isn't going to change
> in any significant way.
> 
> > If yes then I guess it's still better to use MEM_EXPR.  MEM_EXPR also has 
> > the
> > benefit that it does not deem indirect calls as different when cross-jumping
> > compares the insns.  I don't know how important this is though.
> 
> It depends on how much benefit you get from the direct
> branch.  On alpha it's quite a bit, so we work hard to
> make sure that we can get one, if at all possible.

Thanks, RTH.

RichardS,

Can you comment on what RTH is suggesting?  Besides cross-jumping I haven't
seen indirect PIC calls get optimized much, and it seems that splitting late
will avoid the data-flow complications.

I can experiment with this but it would be nice to get some early buy-in.

BTW, I have the R_MIPS_JALR patch ready for submission but if we don't need to
worry about data-flow changes then using MEM_EXPR is not necessary.

Adam


Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Adam Nemet
Richard Henderson writes:
> On 08/28/2009 12:38 AM, Adam Nemet wrote:
> > ... To assist the linker we need to annotate the indirect call
> > with the function symbol.
> >
> > Since the call is expanded early...
> 
> Having experimented with this on Alpha a few years back,
> the only thing I can suggest is to not expand them early.
> 
> I use a combination of peep2's and normal splitters to
> determine if the post-call GP reload is needed, and to
> expand the call itself.

I see.  So I guess you're saying that there is little chance to optimize the
loop I had in my previous email ;(.

Now suppose we split late, shouldn't we still assume that data-flow can change
later.  IOW, wouldn't we be required to use the literal/lituse counting that
alpha does?

If yes then I guess it's still better to use MEM_EXPR.  MEM_EXPR also has the
benefit that it does not deem indirect calls as different when cross-jumping
compares the insns.  I don't know how important this is though.

Adam


Using MEM_EXPR inside a call expression

2009-08-28 Thread Adam Nemet
On MIPS, PIC calls are indirect calls that need to be dispatched via an ABI
mandated register.  At expansion time we load the symbol into a pseudo then
expand the call.  There is a linker optimization that can turn these indirect
calls into direct calls under some circumstances.  This can improve branch
prediction (the real ABI requirement is that the PIC register is live on entry
to the callee).  To assist the linker we need to annotate the indirect call
with the function symbol.

Since the call is expanded early, during the various optimizations the meaning
of the call can change to no longer refer to the original function.  E.g.:

f (int i, int j)
{
  while (i--)
if (j)
  g ();
else
  h ();
}

You would hope that GCC would move the condition out of the loop and preset
the PIC register accordingly with only the indirect call in the loop body.
(No, this does not happen ATM.)

I've however seen this happening with cross-jumping.

AFAICT we have two options.  We either create a simple (local) dataflow in
md_reorg and annotate the calls with the result or set MEM_EXPR of the mem
inside the call to the function decl during expansion.

I have a patch for the latter (I used to do the former in our GCC).  It seems
to me that this perfectly fits with the definition of MEM_EXPR but I don't
think MEM_EXPR is ever used for mems inside calls.  In fact, I don't think any
of the MEM_ATTRS are meaningful in a call expression.

What's promising that cross-jumping treats them correctly.  As it merges
MEM_ATTRS it clears mismatching MEM_EXPRs no matter where the mem expression
is found in the insn.  And my patch bootstraps successfully.

So, is using MEM_EXPR for this a bad idea?

Adam


Re: IRA undoing scheduling decisions

2009-08-25 Thread Adam Nemet
Charles J. Tabony writes:
> Filed as PR 41171.

Thanks.

> Is this actually a performance regression on MIPS?  I suspect either
> sequence would be the same performance on most machines.

Yes it is: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171#c1

Adam


Re: IRA undoing scheduling decisions

2009-08-24 Thread Adam Nemet
"Charles J. Tabony"  writes:
> I see the same difference between GCC 4.3.2 and 4.4.0 when compiling
> for PowerPC, MIPS, ARM, and FR-V.

I can confirm this with mainline on MIPS/Octeon.  Can you please file a
bug.

Adam


Re: Address as HImode when Pmode is QImode?

2009-08-13 Thread Adam Nemet
Markus L  writes:
> I run into an assert in convert_memory_address not beeing able to
> convert the address rtx (beeing HImode) into to Pmode (i.e. QImode). A
> few frames up the I can dump the tree node and it looks like the
> address calculations are done in HImode. Why is the address beeing
> calculated as unsigned long int and not as unsigned int which would be
> Pmode for my target? Is this expected (and my problem originates from
> elsewhere) or am I missing something obvious here?

What's your sizetype?  This could be related to:

http://gcc.gnu.org/ml/gcc/2008-05/msg00313.html

Adam


Re: Code optimization with GCSE

2009-07-16 Thread Adam Nemet
Jean Christophe Beyler writes:
> As we can see, all three are using the symbol_ref data before adding
> their offset. But after cse, we get this:
> 
> (insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
> (const:DI (plus:DI (symbol_ref:DI ("data") )
> (const_int 8 [0x8] 71 {movdi_internal} (nil))
> 
> (insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75)
> (symbol_ref:DI ("data") )) 71
> {movdi_internal} (nil))

You will have to debug CSE.  The actual replacment is done in this loop in
cse_insn:

  /* Terminate loop when replacement made.  This must terminate since
 the current contents will be tested and will always be valid.  */
  while (1)
{

Adam


Re: Code optimization with GCSE

2009-07-15 Thread Adam Nemet
Jean Christophe Beyler  writes:
> uint64_t foo (void)
> {
> return data[0] + data[1] + data[2];
> }
>
> And this generates :
>
> la  r9,data
> la  r7,data+8
> ldd r6,0(r7)
> ldd r8,0(r9)
> ldd r7,16(r9)
>
> I'm trying to see if there is a problem with my rtx costs function
> because again, I don't understand why it would generate 2 la instead
> of using an offset of 8 and 16.

You probably want to look at the RTL dumps.  This code should have been
expanded with the correct offsets (at least that is what happens on
MIPS).  I don't see how later passes would modify the code other than
removing 2 of the 3 "la rX, data" insns.

Adam


Re: Phase 1 of gcc-in-cxx now complete

2009-06-27 Thread Adam Nemet
Ian Lance Taylor  writes:
> I would like to encourage people to try using --enable-build-with-cxx in
> other configuration--other bootstraps, cross-compilers--to see how well
> it works.  Please let me know if you run into problems that you don't
> know how, or don't have time, to fix.

MIPS bootstraps fine with --enable-build-with-cxx:

  http://gcc.gnu.org/ml/gcc-testresults/2009-06/msg02323.html

I don't know if the new failures are related to C++; I will do a C build
later and compare.

Ian, thanks for your C++ work!

Adam


Re: Phase 1 of gcc-in-cxx now complete

2009-06-26 Thread Adam Nemet
Ian Lance Taylor  writes:
> I would like to encourage people to try using --enable-build-with-cxx in
> other configuration--other bootstraps, cross-compilers--to see how well
> it works.  Please let me know if you run into problems that you don't
> know how, or don't have time, to fix.

With GMP 4.2.1 build fails like this:

g++ -mabi=n32 -c  -g -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc 
-I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  
-I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd 
-I../libdecnumber
../../src/gcc/c-lang.c -o c-lang.o
In file included from ../../src/gcc/double-int.h:24,
 from ../../src/gcc/tree.h:30,
 from ../../src/gcc/c-lang.c:27:
//usr/include/gmp.h:515: error: 'std::FILE' has not been declared
make[3]: *** [c-lang.o] Error 1
make[3]: Leaving directory `/mnt/src/gcc-tmp/mips64octeon-linux/gcc'
make[2]: *** [all-stage1-gcc] Error 2
make[2]: Leaving directory `/mnt/src/gcc-tmp/mips64octeon-linux'
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory `/mnt/src/gcc-tmp/mips64octeon-linux'
make: *** [all] Error 2

We need at least GMP 4.2.3 with --enable-build-with-cxx:

>From :

Changes in GMP 4.2.3
Bugs:

...
* For C++, gmp.h now includes cstdio, improving compiler
compatibility. 
...

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-17 Thread Adam Nemet
Ian Lance Taylor writes:
> I'm not entirely sure, but I don't think Jim said the opposite.  He said
> that the way truncate works is machine dependent.  I said that the
> output of truncate is machine independent.  Since truncate is only
> defined for fixed-point modes, I think both statements are true.

OK but in that way every operation is machine dependent not just truncate.
BTW, why is being fixed-point relevant here?

>From that little excerpt I just gathered that maybe my misunderstanding of
treating truncate as a blackbox was not completely without precedence.  But
anyway I think I understand now.  JTBS, can you agree with other statement in
my email?:

> And IIUC this don't-care nature of the other bits that allows backends to
> define the upper bits.  For example to have sign-bit copies there in registers
> to enforce the MIPS64 SI mode representation.  And treating the don't care
> bits outside SI mode in this way is true for any other SI-mode operations
> performed on registers not just truncate, right?  Hmm, nice.

Thanks for all the explanations.

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
Ian Lance Taylor writes:
> truncate has a machine independent meaning.

Yes, I guess with your definition below it does.  It's interesting though that
Jim had said the opposite in the excerpts posted by Jeff:

  And a later message from Jim:
  
  Truncate converts a value from a larger to a smaller mode in a machine
  dependent way.
  
  A subreg just chops off the high bits.  A truncate does not.  The name might
  be a little confusing, but the whole point of truncate is to have something
  that operates differently than a subreg.
  
  Combine is clearly making an invalid transformation.
 
> Yes.  The bits in Nmode's mask are determined by the truncate.  The
> other bits are don't-care.  If the result of the truncate happens to
> wind up in a register, then in some cases PROMOTE_MODE will apply.

And IIUC this don't-care nature of the other bits that allows backends to
define the upper bits.  For example to have sign-bit copies there in registers
to enforce the MIPS64 SI mode representation.  And treating the don't care
bits outside SI mode in this way is true for any other SI-mode operations
performed on registers not just truncate, right?  Hmm, nice.

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
Jeff Law writes:
> Ian Lance Taylor wrote:
> > Adam Nemet  writes:
> >
> >   
> >> I am trying to understand the checkin by Jeff Law from about 11 years ago:
> >>
> >>r19204 | law | 1998-04-14 01:04:21 -0700 (Tue, 14 Apr 1998) | 4 lines
> >>
> >>
> >>* combine.c (simplify_rtx, case TRUNCATE): Respect value of
> >>TRULY_NOOP_TRUNCATION.
> >>
> >>
> >>Index: combine.c
> >>===
> >>--- combine.c   (revision 19018)
> >>+++ combine.c   (revision 19204)
> >>@@ -3736,7 +3736,9 @@ simplify_rtx (x, op0_mode, last, in_dest
> >>   if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
> >>break;
> >> 
> >>-  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)
> >>+  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT
> >>+ && TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode),
> >>+   GET_MODE_BITSIZE (GET_MODE (XEXP 
> >> (x, 0)
> >>SUBST (XEXP (x, 0),
> >>   force_to_mode (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
> >>  GET_MODE_MASK (mode), NULL_RTX, 0));
> >>
> >> This optimization simplifies the input to a truncate by only computing bits
> >> that won't be eliminated by the truncation.  Normally these are the bits in
> >> the output mode mask.  Note that the optimization does not change the 
> >> truncate
> >> into a low-part subreg, which would pretty automatically warrant the
> >> TRULY_NOOP_TRUNCATION check.
> >> 
> >
> > I agree that this patch looks wrong in todays compiler.  There should be
> > no need to call TRULY_NOOP_TRUNCATION if you are in a TRUNCATE anyhow.
> >   
> Based on reviewing my old notes, we do have to ensure that combine 
> doesn't replace a TRUNCATE with a SUBREG as that can result in having a 
> 32bit value that isn't sign-extended, which clearly causes MIPS64 ports 
> grief.

Thanks for the long information in your other reply.

As I said in the original email, we are not turning a TRUNCATE into a SUBREG
in this transformation.  We're just simplifying the input expression to
truncate with the knowledge that only the truncated mode bits are relevant
from the input.  At the end we are still left with a truncate expression but
possibly with a simpler operand.

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
Ian Lance Taylor writes:
> I agree that this patch looks wrong in todays compiler.  There should be
> no need to call TRULY_NOOP_TRUNCATION if you are in a TRUNCATE anyhow.

Thanks.

Do you think we can assume this for TRUNCATEs in general or only for MIPS-like
TRUNCATEs?

I can't think of why it would be useful to represent a mode so that bits
outside the mode mask don't either depent on bits inside the mask or are
don't-care bits.  IOW, can we assume that for any TRUNCATE from wider mode W
to narrower mode N the following holds:

  (truncate:N expr:W) == (truncate:N (and:W expr:W GET_MODE_MASK(Nmode)))

?  Where == is not necessarily identical bit representation of the object
holding the value (e.g. QI HI values in MIPS) but that they are
indistinguishable in the operations that are defined on them.

Adam


Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
I am trying to understand the checkin by Jeff Law from about 11 years ago:

   r19204 | law | 1998-04-14 01:04:21 -0700 (Tue, 14 Apr 1998) | 4 lines
   
   
   * combine.c (simplify_rtx, case TRUNCATE): Respect value of
   TRULY_NOOP_TRUNCATION.
   
   
   Index: combine.c
   ===
   --- combine.c(revision 19018)
   +++ combine.c(revision 19204)
   @@ -3736,7 +3736,9 @@ simplify_rtx (x, op0_mode, last, in_dest
  if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
break;

   -  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)
   +  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT
   +  && TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode),
   +GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0)
SUBST (XEXP (x, 0),
   force_to_mode (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
  GET_MODE_MASK (mode), NULL_RTX, 0));

This optimization simplifies the input to a truncate by only computing bits
that won't be eliminated by the truncation.  Normally these are the bits in
the output mode mask.  Note that the optimization does not change the truncate
into a low-part subreg, which would pretty automatically warrant the
TRULY_NOOP_TRUNCATION check.

Specifically I am curious if this was prompted by MIPS (or SH) or maybe some
other target that have a notion of truncate different from MIPS.

force_to_mode was broken WRT truncate before my patch in 2006:

  http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00553.html

so maybe the Jeff's patch was just trying to avoid the call to force_to_mode.

Anyhow, I don't think that the above transformation is illegal for MIPS64.

When truncating to SI mode the bits outside the resulting SI mode will all be
copies of the SI mode sign bit so their input values are obviously irrelevant.

For QI and HI modes the representation requires the upper 32 bits in the
64-bit word to be copies of the 31st bit.  The transformation might change the
31 bits so it can affect the the resulting bits in the 64-bit word.  This
however should have no visible effect.  Operations on QI- and HI-mode values
are performed in SI mode with proper extensions before the operation if bits
outside the mode would affect the result.

I also tried a few things after removing the above check.  Boostrapping and
regtesting on mips64octeon-linux was successful so was regtesting on
mipsisa64r2-elf.  The assembly output of gcc.c-torture/execute/*.c improved
as:

  13 files changed, 73 insertions(+), 99 deletions(-)

The changes typically remove zero and sign-extensions before a subsequent
truncation.

Any further information on this subject would be very much appreciated.

Adam


Re: increasing the number of GCC reviewers

2009-06-09 Thread Adam Nemet
Andrew Haley  writes:
> We need something more like "I think Fred Bloggs knows gcc well enough
> to approve patches to reload" or "I am Fred Bloggs and I know gcc well
> enough to approve patches to reload."

And whom should such email be sent to?  The SC is best reached on gcc@
but I don't think that recommending someone publicly is necessarly a
good idea.  E.g. what if the SC does not appoint the person; does that
mean that the SC decided that he or she was not qualified enough?

IMO the best way would be to nominate someone to the SC directly and
then if the SC decides to support the nomination they can check with the
person if he or she would accept the appointment.

Adam


Re: Bootstrap failures on solaris

2009-06-09 Thread Adam Nemet
Revital1 Eres  writes:
> Hello,
>
>> i386-pc-solaris:
>>
>> cc1: warnings being treated as errors
>> /export/home/arth/gnu/gcc.git/gcc/tree-ssa-loop-prefetch.c: In function
>> 'loop_prefetch_arrays':
>> /export/home/arth/gnu/gcc.git/gcc/tree-ssa-loop-prefetch.c:1589:7: error:
>
>> format '%ld' expects type 'long int', but argument 5 has type 'long long
> int'
>> make[3]: *** [tree-ssa-loop-prefetch.o] Error 1
>> make[3]: *** Waiting for unfinished jobs
>
> I get this error on ppc also.

See http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00712.html

Adam


STRIP_NOPS and lower-precision/bit-field types

2009-06-04 Thread Adam Nemet
In this testcase:

  struct s
  {
unsigned long a:2;
unsigned long b:40;
unsigned long c:22;
  };
  
  f (struct s s)
  {
g (((unsigned long) (s.b-8)) + 8);
  }

fold breaks the code.  Specifically, after a few transformations we're asked
to fold this.  (I changed the big constant to be in symbolic form for better
understanding):

 D.40719;
   (uint64_t) (D.40719 + ((1<<40) - 8)) + 8

The cast is removed by STRIP_NOPS so with explicit precision and signedness on
the operations we have (the constants are of types of the corresponding
operation):

   (D.40719 +:u40 ((1<<40) - 8)) +:u64 8

split_trees/associate_trees code combines the two constant resulting in:

  (uint64_t) D.40719 +:64 1<<40

Ouch.

The problem is obviously that we remove a zero-extension that would ensure the
modulo-2^40 nature of the addition (remove the overflow).

I see two ways to fix this:

1. Change STRIP_NOPS not to remove the cast if it's changing the precision
(not just when it changes machine modes).  And then audit code quality where
we need to and are allowed to look through these casts.

The above case does not qualify because the first operation may not overflow
into the >40 bits whereas the second may.  For example with 2 as s.b the
substraction should yield 0xff,,fffa and the addition 0x100,,0002.
(Note that there in *no* default integer promotion on bitfield types wider
than int [1].)

2. Just fix this particular case of folding (split_trees/associate_trees) and
hope everything else works ;).

I am for 1, FWIW, but I know this issue has some history [2], so I'd like to
hear others' opinion before I started working on a patch.

Adam

[1] http://gcc.gnu.org/ml/gcc/2008-01/msg00215.html
[2] http://gcc.gnu.org/ml/gcc-patches/2004-07/msg01478.html


Re: From regno to pseudo?

2009-06-03 Thread Adam Nemet
Steven Bosscher  writes:
> Is there a way to get the REG for a given regno?  I am building a
> register renumbering map that is just a pair of unsigned int
> , but I can't figure out how to get the REG for
> new_regno without remembering a pointer to it myself. Is there an
> easier/better way?

regno_reg_rtx in emit-rtl.c?

Adam


Re: Forgetting return values

2009-05-28 Thread Adam Nemet
Jamie Prescott writes:
> > From: Adam Nemet 
> > Jamie Prescott writes:
> > > static inline int set_prop(char const *path, char const *name,
> > >   void const *data, int size)
> > > {
> > > int error;
> > >
> > > asm volatile ("int\t11\n\t"
> > >   : "=a0" (error): "a0" (path), "a1" (name), "a2" 
> > > (data),
> > > "a3" (size));
> > >
> > > return error;
> > > }
> > >
> > > extern int calc(int);
> > >
> > > int proc(int i)
> > > {
> > > int j = calc(i);
> > >
> > > return set_prop(0, 0, &j, sizeof(int));
> > > }
> > ...
> > >
> > > Why is the memory clobber required, and why GCC does not understand to
> > > sync the value to memory when passing the address to a function?
> > 
> > Because you never inform GCC that you will use the value at
> > address *NAME.  Try to use "m"(*name) rather than "a1"(name) in the asm.
> 
> That's 'data', not 'name'. But OK, got it. unfortunately, I cannot use "m" 
> since
> that value need to go into a specific register.
> Any other solution?

You can also have it *in addition* as an argument to the asm that's never
used:

asm volatile ("int\t11\n\t"
  : "=a0" (error): "a0" (path), "a1" (name), "a2" (data),
"a3" (size), "m"(*data));

after changing data's type into something that can be dereferenced.

Adam


Re: Forgetting return values

2009-05-28 Thread Adam Nemet
Jamie Prescott  writes:
> static inline int set_prop(char const *path, char const *name,
>   void const *data, int size)
> {
> int error;
>
> asm volatile ("int\t11\n\t"
>   : "=a0" (error): "a0" (path), "a1" (name), "a2" (data),
> "a3" (size));
>
> return error;
> }
>
> extern int calc(int);
>
> int proc(int i)
> {
> int j = calc(i);
>
> return set_prop(0, 0, &j, sizeof(int));
> }
...
>
> Why is the memory clobber required, and why GCC does not understand to
> sync the value to memory when passing the address to a function?

Because you never inform GCC that you will use the value at
address *NAME.  Try to use "m"(*name) rather than "a1"(name) in the asm.

Adam


Re: cleanup tests failing on MIPS64

2009-04-12 Thread Adam Nemet
John David Anglin writes:
> The same tests now fail on hppa.  This is PR 39651.  I'm fairly certain
> this was introduced by the following change:

I put this PR in the checkin that was just approved on gcc-patc...@.  Please
close the bug if it fixes the failures on hppa too.

Adam


Re: cleanup tests failing on MIPS64

2009-04-11 Thread Adam Nemet
Adam Nemet  writes:
> I am not exactly sure what has exposed this but the bug seems to be old.
> can_throw_external in except.c does not look at the branch delay slot (second
> entry in a SEQUENCE) to determine whether the insn may throw or not.
>
> In gcc.dg/cleanup-8.c for example after inlining fn3, the trapping store is
> moved to the delay slot of abort, which is a nothrow function so we decide
> that fn2 can't throw and then ultimately remove the eh region around fn1 in
> fn0.

I forgot to mention that I started testing a patch.

Adam


Re: cleanup tests failing on MIPS64

2009-04-11 Thread Adam Nemet
Adam Nemet  writes:
> For two testresults now the cleanup tests are failing in both gcc and g++:
>
>   http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg01031.html
>   http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg00592.html
>
> I waited for another testresults because there were some bug fixes in this
> area after the eh changes.
>
> Does somebody know what's going on?  I'll look at it otherwise.

I am not exactly sure what has exposed this but the bug seems to be old.
can_throw_external in except.c does not look at the branch delay slot (second
entry in a SEQUENCE) to determine whether the insn may throw or not.

In gcc.dg/cleanup-8.c for example after inlining fn3, the trapping store is
moved to the delay slot of abort, which is a nothrow function so we decide
that fn2 can't throw and then ultimately remove the eh region around fn1 in
fn0.

Adam


cleanup tests failing on MIPS64

2009-04-10 Thread Adam Nemet
For two testresults now the cleanup tests are failing in both gcc and g++:

  http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg01031.html
  http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg00592.html

I waited for another testresults because there were some bug fixes in this
area after the eh changes.

Does somebody know what's going on?  I'll look at it otherwise.

Adam


Re: fwprop and CSE const anchor opt

2009-04-08 Thread Adam Nemet
Thank you very much.  This was very informative.

Richard Sandiford writes:
> If we have an instruction:
> 
> A: (set (reg Z) (plus (reg X) (const_int 0xdeadbeef)))
> 
> we will need to use something like:
> 
>(set (reg Y) (const_int 0xdead))
>(set (reg Y) (ior (reg Y) (const_int 0xbeef)))
> B: (set (reg Z) (plus (reg X) (reg Y)))
> 
> But if A is in a loop, the Y loads can be hoisted, and the cost
> of A is effectively the same as the cost of B.  In other words,
> the (il)legitimacy of the constant operand doesn't really matter.

My guess is that A not being a recognizable insn, this is relevant at RTL
expansion.  Is this correct?

> In summary, the current costs generally work because:
> 
>   (a) We _usually_ only apply costs to arbitrary instructions
>   (rather than candidate instruction patterns) before
>   loop optimisation.

I don't think I understand this point.  I see the part that the cost is
typically queried before loop optimization but I don't understand the
distinction between "arbitrary instructions" and "candidate instruction
patterns".  Can you please explain the difference?

>   (b) It doesn't matter what we return for invalid candidate
>   instruction patterns, because recog will reject them anyway.
> 
> So I suppose my next question is: are you seeing this problem with cse1
> or cse2?  The reasoning behind the zero cost might still be valid for
> REG_EQUAL notes in cse1.  However, it's probably not right for cse2,
> which runs after loop hoisting.

I am seeing it with both, so at least at cse2 we could do it with this.

> Perhaps we could add some kind of context parameter to rtx_costs
> to choose between the hoisting and non-hoisting cost.  As well as
> helping with your case, it could let us use the non-hoisting cost
> before loop optimisation in cases where the insn isn't going to
> go in a loop.  The drawback is that we then have to replicate
> even more of the .md file in rtx_costs.
> 
> Alternatively, perhaps we could just assume that rtx_costs always
> returns the hoisted cost when optimising for speed, in which case
> I think your alternative solution would be theoretically correct
> (i.e. not a hack ;)).

OK, I think I am going to propose this in the patch then.  It might still be
interesting to experiment with providing more context to rtx_costs.

> E.g. suppose we're deciding how to implement an in-loop multiplication.
> We calculate the cost of a multiplication instruction vs. the cost of a
> shift/add sequence, but we don't consider whether any of the backend-specific
> shift/add set-up instructions could be hoisted.  This would lead to us
> using multiplication insns in cases where we don't want to.
> 
> (This was one of the most common situations in which the zero cost helped.)

I am not sure I understand this.  Why would we decide to hoist suboperations
of a multiplication?  If it is loop-variant then even the suboperations are
loop-variant whereas if it is loop-invariant then we can hoist the whole
operation.  What am I missing?

Adam


Re: fwprop and CSE const anchor opt

2009-04-05 Thread Adam Nemet
Richard Sandiford writes:
> Adam Nemet  writes:
> > * Synthesizing multi-insns constants from const anchors make sense 
> > regardless
> > whether the constant is used as an address so I am adjusting the cost system
> > to make those stick independent of the context.  I still need to benchmark
> > this.
> 
> Out of interest, what sort of changes did you need to make?  Do you care
> about the cases where rtx_costs is applied to a pattern that won't later
> be checked by recog (such as the calls from the expander)?  Or do you
> just care about the cases where rtx_costs is used to check whether a
> validate_change would be profitable?

I am glad you asked because I myself was going to ask you some questions in
this area ;).

I think the best way to formulate the profitability question for multi-insn
constants in CSE is this.  We found an expression using a const-anchor that is
equivalent to the REG_EQUAL note on the insn.  We go with the const-anchor
expression if it is cheaper than the REG_EQUAL expression.

Comparing the cost of the actual source in the insn against the const-anchor
expression would ignore the fact that the preceding instructions buiding up
the constants become redundant with the const-anchor expression.

I also have an alternative solution that feels more hackish and gives less
control to the backend but works with the current cost model.  For a
multi-insn constant if the current cost and the cost of the new expression are
equal we prefer the new expression.  This works for MIPS because AFAICT we
always replace a binary expression with another one of the same cost.

The problem with the first approach and I feel you were anticipating this with
your question is that multi-insn constants have a cost of zero in the MIPS
backend currently.  Or with the code from mips_rtx_costs:

case CONST_INT:
[...]
  /* When not optimizing for size, we care more about the cost
 of hot code, and hot code is often in a loop.  If a constant
 operand needs to be forced into a register, we will often be
 able to hoist the constant load out of the loop, so the load
 should not contribute to the cost.  */
  if (!optimize_size
  || mips_immediate_operand_p (outer_code, INTVAL (x)))
{
  *total = 0;
  return true;
}

That is really hard to beat for an immediate add, which is COSTS_N_INSNS (1).

Since you already offered :), can you please explain where the above code is
needed and whether we can refine the picture a little bit.  It definitely
feels like the above cost value is only true in certain contexts.  This might
be a naive comment but maybe somehow limiting the effect to that particular
context would be a good improvement.

Adam


Re: fwprop and CSE const anchor opt

2009-04-04 Thread Adam Nemet
Richard Sandiford writes:
> Adam Nemet  writes:
> > In order for my CSE const anchor patch to work I needed to drastically lower
> > the cost of immediate addition in the MIPS backend.  This was acceptable as 
> > a
> > proof of concept but not in general of course.
> >
> > The problem is with "single-insn"/simple constants.  We would also like 
> > these
> > to use const anchors in the hope that the resulting expression would get
> > propagated into MEM expressions.  I was hoping that fwprop would do this
> > propagation for me.  However, since a single-insn constant load is cheaper
> > than an immediate addition (make sense), fwprop is free to propagate either
> > (1) into (2) or (2) into (3) here:
> >
> >   (1) a <- C
> >   |
> >   +--> (2) b <- a + D
> >   ||
> >   |+--> (3) mem(b)
> >   |
> >   +--> (4) use(a)
> >
> > Which one it does depends on which one it tries first.  Right now we go in
> > insn order so we'd do (1) to (2) preventing to do (2) to (3) later.
> 
> Probably a dumb question, sorry, but is this only a problem for MIPS when
> C + D is in the range [0x8000, 0x]?  Or is the (1)->(2) substitution
> somehow succeeding in other cases?

Right and no.  I wasn't very precise saying single-insn constants; I should
have said single-insn constants produced with ORI using the zero register.
(The testcase in PR/33699 is really good :).)

Just to recap the other cases:

* Synthesizing multi-insns constants from const anchors make sense regardless
whether the constant is used as an address so I am adjusting the cost system
to make those stick independent of the context.  I still need to benchmark
this.

* For single-insn constants produced with ADDIU using the zero register, we
don't need const anchors because they can already be used in addresses
offsetting the zero register.

Adam


fwprop and CSE const anchor opt

2009-04-02 Thread Adam Nemet
In order for my CSE const anchor patch to work I needed to drastically lower
the cost of immediate addition in the MIPS backend.  This was acceptable as a
proof of concept but not in general of course.

The problem is with "single-insn"/simple constants.  We would also like these
to use const anchors in the hope that the resulting expression would get
propagated into MEM expressions.  I was hoping that fwprop would do this
propagation for me.  However, since a single-insn constant load is cheaper
than an immediate addition (make sense), fwprop is free to propagate either
(1) into (2) or (2) into (3) here:

  (1) a <- C
  |
  +--> (2) b <- a + D
  ||
  |+--> (3) mem(b)
  |
  +--> (4) use(a)

Which one it does depends on which one it tries first.  Right now we go in
insn order so we'd do (1) to (2) preventing to do (2) to (3) later.

It seems to me that it would be preferable if fwprop tried propagating into
single uses first before trying to propagate into others.  This should improve
the chances of making more defs redundant.  With the example above, (2) would
be redundant after propagation.

Am I missing something?  If this sounds reasonable I can try to work out a
patch and see what happens.

Adam


Re: Constant folding and Constant propagation

2009-03-16 Thread Adam Nemet
Jean Christophe Beyler writes:
> I set up your patch and I get an internal error on this test program:

You're right.  I haven't handled the case properly when the constant itself
was an anchor constant (e.g. 0).  Try this version.

Adam


* cse.c (get_const_anchors): New function.
(insert_const_anchors): New function.
(cse_insn): Set src_related using anchor constants.  Insert
constant anchors into the table of available expressions.

* config/mips/mips.c (mips_rtx_costs): Make immediate-add even cheaper
than loading a simple constant into a register.

Index: gcc/cse.c
===
--- gcc.orig/cse.c  2009-03-08 12:16:56.0 -0700
+++ gcc/cse.c   2009-03-16 23:07:40.0 -0700
@@ -3961,6 +3961,55 @@ record_jump_cond (enum rtx_code code, en
 
   merge_equiv_classes (op0_elt, op1_elt);
 }
+
+#define TARGET_CONST_ANCHOR 0x8000
+
+/* Compute the upper and lower anchors for CST as base, offset pairs.  Return
+   NULL_RTX if CST is equal to an anchor.  */
+
+static rtx
+get_const_anchors (rtx cst, rtx *upper_base, HOST_WIDE_INT *upper_offs,
+  HOST_WIDE_INT *lower_offs)
+{
+  HOST_WIDE_INT n, upper, lower;
+
+  n = INTVAL (cst);
+  lower = n & ~(TARGET_CONST_ANCHOR - 1);
+  if (n == lower)
+return NULL_RTX;
+  upper = (n + (TARGET_CONST_ANCHOR - 1)) & ~(TARGET_CONST_ANCHOR - 1);
+
+  *upper_base = GEN_INT (upper);
+  *upper_offs = n - upper;
+  *lower_offs = n - lower;
+  return GEN_INT (lower);
+}
+
+/* Create equivalences between the two anchors of a constant value and the
+   corresponding register-offset expressions.  Use the register REG, which is
+   equivalent to the constant value CLASSP->exp.  */
+
+static void
+insert_const_anchors (rtx reg, struct table_elt *classp,
+ enum machine_mode mode)
+{
+  rtx lower_base, upper_base;
+  HOST_WIDE_INT lower_offs, upper_offs;
+  rtx lower_exp, upper_exp;
+  struct table_elt *celt;
+  rtx cst = classp->exp;
+
+  lower_base = get_const_anchors (cst, &upper_base, &upper_offs, &lower_offs);
+  if (!lower_base)
+return;
+  lower_exp = plus_constant (reg, -lower_offs);
+  upper_exp = plus_constant (reg, -upper_offs);
+
+  celt = insert (lower_base, NULL, HASH (lower_base, mode), mode);
+  insert (lower_exp, celt, HASH (lower_exp, mode), mode);
+  celt = insert (upper_base, NULL, HASH (upper_base, mode), mode);
+  insert (upper_exp, celt, HASH (upper_exp, mode), mode);
+}
 
 /* CSE processing for one instruction.
First simplify sources and addresses of all assignments
@@ -4595,6 +4644,67 @@ cse_insn (rtx insn)
}
 #endif /* LOAD_EXTEND_OP */
 
+  /* Try to express the constant using a register-offset expression using
+anchor constants.  */
+
+  if (!src_related && src_const && GET_CODE (src_const) == CONST_INT)
+   {
+ rtx lower_base, upper_base;
+ struct table_elt *lower_elt, *upper_elt, *elt;
+ HOST_WIDE_INT lower_offs, upper_offs, offs;
+
+ lower_base = get_const_anchors (src_const, &upper_base, &upper_offs,
+ &lower_offs);
+ if (lower_base)
+   {
+ lower_elt = lookup (lower_base, HASH (lower_base, mode), mode);
+ upper_elt = lookup (upper_base, HASH (upper_base, mode), mode);
+
+ /* Loop over LOWER_ELTs and UPPER_ELTs to find a reg-offset pair
+that we can use to express SRC_CONST.  */
+ elt = NULL;
+ if (lower_elt)
+   {
+ elt = lower_elt->first_same_value;
+ offs = lower_offs;
+   }
+ else if (upper_elt)
+   {
+ elt = upper_elt->first_same_value;
+ upper_elt = NULL;
+ offs = upper_offs;
+   }
+ while (elt)
+   {
+ if (REG_P (elt->exp)
+ || (GET_CODE (elt->exp) == PLUS
+ && REG_P (XEXP (elt->exp, 0))
+ && GET_CODE (XEXP (elt->exp, 1)) == CONST_INT))
+   {
+ rtx x = plus_constant (elt->exp, offs);
+ if (REG_P (x)
+ || (GET_CODE (x) == PLUS
+ && IN_RANGE (INTVAL (XEXP (x, 1)),
+  -TARGET_CONST_ANCHOR,
+  TARGET_CONST_ANCHOR - 1)))
+   {
+ src_related = x;
+ break;
+   }
+   }
+
+ if (!elt->next_same_value && upper_elt)
+   {
+ elt = upper_elt->first_same_value;
+ upper_elt = NULL;
+ offs = upper_offs;
+   }
+ else
+   elt = elt->next_same_value;
+

RE: ARM compiler rewriting code to be longer and slower

2009-03-16 Thread Adam Nemet
Ramana Radhakrishnan writes:
> [Resent because of account funnies. Apologies to those who get this twice]
> 
> Hi,
> 
> > > This problem is reported every once in a while, all targets with
> > small
> > > load-immediate instructions suffer from this, especially since GCC
> > 4.0
> > > (i.e. since tree-ssa).  But it seems there is just not enough
> > interest
> > > in having it fixed somehow, or someone would have taken care of it by
> > > now.
> > >
> > > I've summed up before how the problem _could_ be fixed, but I can't
> > > find where.  So here we go again.
> > >
> > > This could be solved in CSE by extending the notion of "related
> > > expressions" to constants that can be generated from other constants
> > > by a shift. Alternatively, you could create a simple, separate pass
> > > that applies CSE's "related expressions" thing in dominator tree
> > walk.
> > 
> > See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for
> > handling
> > something similar when related expressions differ by a small additive
> > constant.  I am planning to finish this and submit it for 4.5.
> 
> Wouldn't doing this in CSE only solve the problem within an extended basic
> block and not necessarily across the program ? Surely you'd want to do it
> globally or am I missing something very basic here ?

No, you're not.  There are plans moving some of what's in CSE to a new LCM
(global) pass.  Also note that for a global a pass you clearly need some more
sophisticated cost model for deciding when CSEing is beneficial.  On a
multi-scalar architecture, instructions synthesizing consts sometimes appear
to be "free" whereas holding a value a in a register for an extended period of
time is not.

Adam


Re: ARM compiler rewriting code to be longer and slower

2009-03-15 Thread Adam Nemet
Steven Bosscher  writes:
> On Sun, Mar 15, 2009 at 11:19 PM, Ramana Radhakrishnan
>  wrote:
>> I'm not sure about the best way to fix this but I've filed this for
>> the moment as
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39468
>
> This problem is reported every once in a while, all targets with small
> load-immediate instructions suffer from this, especially since GCC 4.0
> (i.e. since tree-ssa).  But it seems there is just not enough interest
> in having it fixed somehow, or someone would have taken care of it by
> now.
>
> I've summed up before how the problem _could_ be fixed, but I can't
> find where.  So here we go again.
>
> This could be solved in CSE by extending the notion of "related
> expressions" to constants that can be generated from other constants
> by a shift. Alternatively, you could create a simple, separate pass
> that applies CSE's "related expressions" thing in dominator tree walk.

See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for handling
something similar when related expressions differ by a small additive
constant.  I am planning to finish this and submit it for 4.5.

Adam


Re: mips64*-*-linux multi arch handling

2009-03-04 Thread Adam Nemet
Thanks for measuring these.

Laurent GUERBY  writes:
> For bzip2 trunk regress badly in performance against 4.3.2
> but n32 is indeed relatively faster than 32 (still slower than 4.3.2).
> For gzip trunk matches 4.3.2 but n32 is slower than 32 and 64 even
> slower.

Last time I checked (on Octeon) gzip was completely dominated by the hash
table search of the deflation code.  The new ABIs shine when you have function
calls or 64-bit arithmetic, which I don't think is the case in gzip.

I might look at the 4.3.2->trunk regressions (if I can reproduce on Octeon).

Adam


Re: Constant folding and Constant propagation

2009-03-03 Thread Adam Nemet
Adam Nemet  writes:
> I am actually looking at something similar for PR33699 for MIPS.  My plan is
> to experiment extending cse.c with putting "anchor" constants to the available
> expressions along with the original constant and then querying those later for
> constant expressions.

See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00161.html.

Adam


Re: Constant folding and Constant propagation

2009-02-27 Thread Adam Nemet
"Rahul Kharche"  writes:
> GCSE won't help with your trimmed down example
>
> int main(void)
> {
> long a = 0xcafecafe;
>
> printf("Final: %lx %lx %lx\n", a, a+5, a+15);
> return EXIT_SUCCESS;
> }
>
> I believe Paolo's canon_reg solution together with tweaking
> rtx_cost of constants with outer code PLUS might help. Any
> luck with that pass?

As I understand Paolo's change is to move some of the functionality from cse.c
to a new LCM pass.  However cse.c does not handle the above yet.

I am actually looking at something similar for PR33699 for MIPS.  My plan is
to experiment extending cse.c with putting "anchor" constants to the available
expressions along with the original constant and then querying those later for
constant expressions.

Adam


Re: Constant folding and Constant propagation

2009-02-06 Thread Adam Nemet
Bernd Schmidt  writes:
> Take a look at reload_cse_move2add.

I don't think that powerful enough; it requires the same destination
registers:

  /* Try to transform (set (REGX) (CONST_INT A))
  ...
  (set (REGX) (CONST_INT B))
 to
  (set (REGX) (CONST_INT A))
  ...
  (set (REGX) (plus (REGX) (CONST_INT B-A)))

I think you really need the Joern's optmize_related_values patch.  Also see
PR33699.

Adam


Re: About -fstack-protector in mips

2009-02-05 Thread Adam Nemet
Yoriko Komatsuzaki  writes:
> Could you tell me why it doesn't work well around FRAME_GROWS_DOWNWARD
> on mips ?

I have a WIP patch for this but was holding back mostly because of stage3/4
and that I was trying to make FRAME_GROWS_DOWNWARD the default and I was
running into performance issues.

The patch below enables FRAME_GROWS_DOWNWARD with -mframe-grows-downward
(mostly for testing) and with -fstack-protector.

Adam

Index: config/mips/mips.opt
===
--- config/mips/mips.opt(revision 142249)
+++ config/mips/mips.opt(working copy)
@@ -283,3 +283,6 @@ Perform VR4130-specific alignment optimi
 mxgot
 Target Report Var(TARGET_XGOT)
 Lift restrictions on GOT size
+
+mframe-grows-downward
+Target Var(flag_frame_grows_downward) Init(1)
Index: config/mips/mips.c
===
--- config/mips/mips.c  (revision 142249)
+++ config/mips/mips.c  (working copy)
@@ -274,10 +274,10 @@ struct mips_frame_info GTY(()) {
   HOST_WIDE_INT gp_sp_offset;
   HOST_WIDE_INT fp_sp_offset;
 
-  /* The offset of arg_pointer_rtx from frame_pointer_rtx.  */
+  /* The offset of arg_pointer_rtx from the bottom of the frame.  */
   HOST_WIDE_INT arg_pointer_offset;
 
-  /* The offset of hard_frame_pointer_rtx from frame_pointer_rtx.  */
+  /* The offset of hard_frame_pointer_rtx from the bottom of the frame.  */
   HOST_WIDE_INT hard_frame_pointer_offset;
 };
 
@@ -8593,10 +8593,9 @@ mips_compute_frame_info (void)
 
   cfun->machine->global_pointer = mips_global_pointer ();
 
-  /* The first STARTING_FRAME_OFFSET bytes contain the outgoing argument
- area and the $gp save slot.  This area isn't needed in leaf functions,
- but if the target-independent frame size is nonzero, we're committed
- to allocating it anyway.  */
+  /* The first bytes contain the outgoing argument area and the $gp save slot.
+ This area isn't needed in leaf functions, but if the target-independent
+ frame size is nonzero, we're committed to allocating it anyway.  */
   if (size == 0 && current_function_is_leaf)
 {
   /* The MIPS 3.0 linker does not like functions that dynamically
@@ -8612,7 +8611,7 @@ mips_compute_frame_info (void)
   else
 {
   frame->args_size = crtl->outgoing_args_size;
-  frame->cprestore_size = STARTING_FRAME_OFFSET - frame->args_size;
+  frame->cprestore_size = MIPS_GP_SAVE_AREA_SIZE;
 }
   offset = frame->args_size + frame->cprestore_size;
 
@@ -8746,12 +8745,16 @@ mips_initial_elimination_offset (int fro
 
   mips_compute_frame_info ();
 
-  /* Set OFFSET to the offset from the soft frame pointer, which is also
- the offset from the end-of-prologue stack pointer.  */
+  /* Set OFFSET to the offset from the end-of-prologue stack pointer.  */
   switch (from)
 {
 case FRAME_POINTER_REGNUM:
-  offset = 0;
+  if (FRAME_GROWS_DOWNWARD)
+   offset = (cfun->machine->frame.args_size
+ + cfun->machine->frame.cprestore_size
+ + cfun->machine->frame.var_size);
+  else
+   offset = 0;
   break;
 
 case ARG_POINTER_REGNUM:
Index: config/mips/mips.h
===
--- config/mips/mips.h  (revision 142249)
+++ config/mips/mips.h  (working copy)
@@ -2035,14 +2035,20 @@ enum reg_class
 
 /* Stack layout; function entry, exit and calling.  */
 
+#define FRAME_GROWS_DOWNWARD (flag_frame_grows_downward || flag_stack_protect)
+
 #define STACK_GROWS_DOWNWARD
 
-/* The offset of the first local variable from the beginning of the frame.
-   See mips_compute_frame_info for details about the frame layout.  */
+#define MIPS_GP_SAVE_AREA_SIZE \
+  (TARGET_CALL_CLOBBERED_GP ? MIPS_STACK_ALIGN (UNITS_PER_WORD) : 0)
+
+/* The offset of the first local variable from the frame pointer.  See
+   mips_compute_frame_info for details about the frame layout.  */
 
-#define STARTING_FRAME_OFFSET  \
-  (crtl->outgoing_args_size\
-   + (TARGET_CALL_CLOBBERED_GP ? MIPS_STACK_ALIGN (UNITS_PER_WORD) : 0))
+#define STARTING_FRAME_OFFSET  \
+  (FRAME_GROWS_DOWNWARD\
+   ? 0 \
+   : crtl->outgoing_args_size + MIPS_GP_SAVE_AREA_SIZE)
 
 #define RETURN_ADDR_RTX mips_return_addr
 



Re: MIPS64 -msym32 and DWARF2_ADDR_SIZE

2009-02-01 Thread Adam Nemet
Richard Sandiford writes:
> How about the patch below?  I'll apply it in the next couple of days
> if there are no objections.

Thanks for patch.  I also like the new comments you added.

Adam


MIPS64 -msym32 and DWARF2_ADDR_SIZE

2009-01-30 Thread Adam Nemet
-msym32 changes DWARF's address_size from 64 bits to 32 bits.  This means that
while symbols are 64-bit (due to ELF64), target addresses in the debug info
are 32-bit.

There is support for this in DWARF of course in fact you can specify different
address_size for each compilation unit which nicely maps with -msym32 being
link-compatible with regular N64 objects.

However, this asymmetry exposed several bugs in binutils.  Also, as I just
discovered today, dwarfdump (libdwarf) has no support for changing the
address_size between compilation units and in fact derives address_size from
the ELF class (ELF64/ELF32).  (Obviously, that's a bug in libdwarf.)

So my question is whether the saving in the size of the debug info with
-msym32 is really worth the trouble here or should we just start generating
64-bit addresses with -msym32?

Adam


Re: New GCC Runtime Library Exception

2009-01-27 Thread Adam Nemet
Gerald Pfeifer  writes:
> +January 27, 2008

2009 ;)

Adam


Missed bitfield packing?

2009-01-15 Thread Adam Nemet
struct s
{
  char a:4;
  char b:8;
  char c:4;
} __attribute__ ((packed))

is 3 bytes long because b gets pushed to the next byte boundary.

You would think that similarly:

struct t
{
  char  a:4;
  short b:16;
  char  c:4;
} __attribute__ ((packed))

would come out to be 4 bytes long.  However, that's not the case because here,
b gets bit-packed.

The reason for this behavior is that finish_struct does not propagate packed
to fields whose type has alignment <= BITS_PER_UNIT:

  for (x = fieldlist; x; x = TREE_CHAIN (x))
{
  if (TREE_TYPE (x) == error_mark_node)
continue;

  DECL_CONTEXT (x) = t;

  if (TYPE_PACKED (t) && TYPE_ALIGN (TREE_TYPE (x)) > BITS_PER_UNIT)
DECL_PACKED (x) = 1;

Is this intentional or an oversight?  Shouldn't the alignment check be
disregarded for DECL_BIT_FIELDs?

Also note that extend.texi seems to be pretty clear about packed bitfields:

  @item packed
  @cindex @code{packed} attribute
  The @code{packed} attribute specifies that a variable or structure field
  should have the smallest possible alignment---one byte for a variable,
  and one bit for a field, unless you specify a larger value with the
  @code{aligned} attribute.

Adam


Re: gcc -r

2008-12-09 Thread Adam Nemet
Ralf Corsepius writes:
> So, my questions actually were aiming at
> 
> * whether
>  "gcc ... -nostdlib -r" 
> and 
> "gcc ... -nostdlib -Wl,-r"
> are equivalent 
> 
> * if the fact that "gcc -r" appears to work, can be exploited or whether
> this is a "random accident" and/or intentionally undocumented feature,
> which may go away at any time.

Right and Ian has already answered this.  I just wanted to add that AFAIC the
most portable way to specify a partial link is still through the gcc driver.

(Whether you use -Wl,-r or -r depends on your belief if this switch would ever
be used for something else in GCC.)

Adam


Re: gcc -r

2008-12-09 Thread Adam Nemet
Ian Lance Taylor <[EMAIL PROTECTED]> writes:
> Ralf Corsepius <[EMAIL PROTECTED]> writes:
>
>> So what would you recommend: To use "gcc -r" or "gcc -Wl,-r" ?
>
> Ah, when you put the question like that, I would recommend "ld -r".
> This is the one case where you get no advantage from using the gcc
> driver to invoke the linker, and it can actually mess you up by adding
> unexpected files to the command line.

If a different linker emulation needs to be selected you're still better off
using gcc.  E.g., on linux64 n64 this:

  gcc -nostdlib -Wl,-r -mabi=64 ...

is probably easier to remember than this:

  ld -melf64btsmip -r ...

Adam


FRAME_GROWS_DOWNWARD and expand_stack_vars

2008-12-07 Thread Adam Nemet
I am working on FRAME_GROWS_DOWNWARD for MIPS and I am seeing two performance
issues with the code generated.  The first one has to do with the order how
expand_stack_vars places locals on the stack.

Consider this function (simplified from CSiBE's
replaypc-0.4.0.preproc:find-GOPs):

  f ()
  {
char buf[32768];
int i;
g (&buf, &i);
  }

expand_stack_vars allocates variables in increasing order of their size.  So
with !FRAME_GROWS_DOWNWARD smaller variables are closer to the stack pointer
making it more likely that they can be accesses with base + offset addressing.

With FRAME_GROWS_DOWNWARD the opposite happens so computing the address of
variable i now takes multiple instructions.

To fix this we can change the allocation with FRAME_GROWS_DOWNWARD (see the
patch below).  However, I can see that on some targets for example if the hard
frame pointer is used and it points somewhere above or at the local variables
(unlike on MIPS) then you'd still want the original order.  So maybe this
should really be controlled by another backend knob.

Comments?

Adam

* cfgexpand.c (expand_stack_vars): Expand variables in decreasing
order of their size if FRAME_GROWS_DOWNWARD.

Index: cfgexpand.c
===
--- cfgexpand.c (revision 142249)
+++ cfgexpand.c (working copy)
@@ -901,7 +901,13 @@ expand_stack_vars (bool (*pred) (tree))
 {
   HOST_WIDE_INT offset;
 
-  i = stack_vars_sorted[si];
+  /* If the frame grows downward allocate larger object first, away from
+the stack pointer.  We have a better chance this way to access more
+locals with base + offset addressing.  */
+  if (FRAME_GROWS_DOWNWARD)
+   i = stack_vars_sorted[n - 1 - si];
+  else
+   i = stack_vars_sorted[si];
 
   /* Skip variables that aren't partition representatives, for now.  */
   if (stack_vars[i].representative != i)



FAIL: gcc.target/mips/octeon-exts-2.c scan-assembler-times

2008-09-22 Thread Adam Nemet
Hi David,

David Daney writes:
> gcc.target/mips/octeon-exts-2.c is failing when configured --with-arch=sb1
> 
> Do you know if it is failing universally or only on non-octeon targets?

It's failing on little-endian.  There is also another test that's failing on
64-bit little-endian.  I will submit a patch later.  I was just waiting for
the other Octeon patches to settle.  Thanks for the note.

Adam


Re: New assert in haifa-sched.c

2008-09-16 Thread Adam Nemet
Eric Botcazou writes:
> > Yes, the assert is really checking exactly that.  Several pieces of
> > haifa-sched.c assume that the instruction has been recognized during
> > scheduler initialization to speed up checking if instruction is normal
> > or some kind of use/clobber/asm.
> 
> What happens if an instruction hasn't been recognized upon reaching these 
> parts of haifa-sched.c?  Will it be only mis-scheduled, i.e. will this only 
> result in inferior, but still correct code?  If so, the assertion shouldn't 
> be enabled in release mode but only if ENABLE_CHECKING is defined.

Well, I can't bootstrap as it is so it would be nice if this was fixed
regardless of the checking level.  (And this is what my patch does in
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg00574.html.)  Another option
would be to completely remove the assert.

Adam


Re: Bootstrap failure in sparc-sun-solaris2.10

2008-09-12 Thread Adam Nemet
Eric Botcazou <[EMAIL PROTECTED]> writes:
>> Applying the patch Adam created and posted in the message below resolved
>> the issue and the compiler successfully bootstrapped:
>>
>> http://gcc.gnu.org/ml/gcc/2008-09/msg00139.html
>
> Thanks for reporting this.  I now can close PR 37424.
>
>> There was one reply to this message; I don't know if the patch is being
>> reworked or been formally submitted yet, but it did fix my build.
>
> OK, I'll take a look.

Yes it was formally submitted here, no review so far:

  http://gcc.gnu.org/ml/gcc-patches/2008-09/msg00574.html

Adam


Re: New assert in haifa-sched.c

2008-09-05 Thread Adam Nemet
Maxim Kuvyrkov writes:
> I'm not 100% sure about current state of things, considering recent 
> merge of sel-sched, but before that it was:
> 
> set_priorities() -> priority() -> dep_cost() -> recog_memoized().

I don't think that was the case for all insns even before the patch.  The only
new thing is the assert which now catches this.

If the consumer is an asm just like in gcc.dg/tree-ssa/stdarg-2.c:f10() then
we would not call recog on the producer inside dep_cost*.

The patch below fixes the issue for me.  I am going to test this if it looks
good to people.

Adam

Index: haifa-sched.c
===
--- haifa-sched.c   (revision 139918)
+++ haifa-sched.c   (working copy)
@@ -646,7 +646,8 @@ insn_cost (rtx insn)
 
 /* Compute cost of dependence LINK.
This is the number of cycles between instruction issue and
-   instruction results.  */
+   instruction results.  We also use this function to call
+   recog_memoized on all insns.  */
 int
 dep_cost_1 (dep_t link, dw_t dw)
 {
@@ -657,7 +658,10 @@ dep_cost_1 (dep_t link, dw_t dw)
  This allows the computation of a function's result and parameter
  values to overlap the return and call.  */
   if (recog_memoized (used) < 0)
-cost = 0;
+{
+  cost = 0;
+  recog_memoized (DEP_PRO (link));
+}
   else
 {
   rtx insn = DEP_PRO (link);
@@ -2305,6 +2309,8 @@ choose_ready (struct ready_list *ready, 
  {
insn = ready_element (ready, i);
 
+   /* If this insn is recognizable we should have already
+  recognized it in dep_cost_1.  */
gcc_assert (INSN_CODE (insn) >= 0
|| recog_memoized (insn) < 0);
 



Re: New assert in haifa-sched.c

2008-09-05 Thread Adam Nemet
Maxim Kuvyrkov writes:
> Yes, the assert is really checking exactly that.  Several pieces of 
> haifa-sched.c assume that the instruction has been recognized during 
> scheduler initialization to speed up checking if instruction is normal 
> or some kind of use/clobber/asm.

Thanks for the info but I can't seem to find the code where this is supposed
to be happening.  Can you point me to the code?

Adam


New assert in haifa-sched.c

2008-09-04 Thread Adam Nemet
haifa-sched.c:
  2302/* Let the target filter the search space.  */
  2303for (i = 1; i < ready->n_ready; i++)
  2304  if (!ready_try[i])
  2305{
  2306  insn = ready_element (ready, i);
  2307  
  2308  gcc_assert (INSN_CODE (insn) >= 0
  2309  || recog_memoized (insn) < 0);

I am hitting this assert with the Octeon pipeline patch.  Isn't the check on
recog_memoized reversed?  Or are we really asserting here that the insn has
either been recognized earlier or won't be recognized now either?!

Adam


Re: RFC: Adding non-PIC executable support to MIPS

2008-07-02 Thread Adam Nemet
Richard Sandiford writes:
> However, IMO, your argument about MTI being the central authority
> is a killer one.  The purpose of the GNU tools should be to follow
> appropriate standards where applicable (and extend them where it
> seems wise).  So from that point of view, I agree that the GNU tools
> should follow the ABI that Nigel and MTI set down.  Consider my
> patch withdrawn.

While I'm not entirely clear how this decision came about I'd like to point
out that it is unfortunate that MTI had not sought wider consensus for this
ABI extension among MIPS implementors and the community.

We would not be in this situation with duplicated efforts and much frustration
if this proposal had been circulated properly ahead of time.

> I've been thinking about that a lot recently, since I heard about
> your implementation.  I kind-of guessed it had been agreed with MTI
> beforehand (although I hadn't realised MTI themselves had written
> the specification).  Having thought it over, I think it would be best
> if I stand down as a MIPS maintainer and if someone with the appropriate
> commercial connections is appointed instead.  I'd recommend any
> combination of yourself, Adam Nemet and David Daney (subject to
> said people being willing, of course).

Richard, while I understand your frustration I really hope that you will
reconsider your decision and remain the MIPS maintainer.  I think there is a
chance that if the community expresses that MTI should seek broader consensus
for such proposals they will do so in the future.

Your expertise as the GCC maintainer has improved the backend tremendously and
and you should be given all the information necessary to continue your great
work.

Adam


Re: newlib & libgcov

2008-06-12 Thread Adam Nemet
"Bingfeng Mei" <[EMAIL PROTECTED]> writes:
> Both -Dinhibit_libc and -DL_gcov are used as shown in our building log
> file.  I guess -Dinhibit_libc is added because we used newlibc instead
> of glibc. I tried to grep these functions in newlibc and didn't find
> them.  My question is how to enable gcov with newlibc.  Do I need to
> write my own versions of them?  Any suggestion is greatly appreciated.  

You need to configure with --with-headers so that the headers are used during
the build and that inhibit_libc is not defined.  I think this is what the
crosstool-newlib script does as well.

Adam


Re: About the gcc Warning: setting incorrect section attributes for .note

2008-05-22 Thread Adam Nemet
"XU SHENG" <[EMAIL PROTECTED]> writes:
>   int priv_dat  __attribute__ ((section(".note"))) = MAGIC;
>
...
>
>   It's clear to me that only section marked attribute with the
> startup of ".note" can be compiled to section with type SHT_NOTE in ELF
> file. Then only problem is assembler in gcc assumed ".note" section as
> "aw" (allocate/write) as default property, but actually, the .note
> section can be only marked with "a" (note section is readonly). If I
> modify the temporary .s export by gcc assembler by delete "w" property,
> the warning is disappeared.

Did you try making priv_dat const?

Adam


Re: DFA Scheduler - unable to pipeline loads

2007-09-04 Thread Adam Nemet
Matt Lee writes:
> In any case, I am trying to optimize the case where there is clearly no
> aliasing. Your suggestion regarding movmemsi is interesting. I have not used
> this pattern before and assumed that it was required only when something
> special must be done to do block moves. In my architecture, a block move is
> not special and is equivalent a series of loads and stores. Why do I need
> this pattern and why/how does the aliasing conflict become a non-issue when
> defining this pattern?

When expanding this pattern you can take advantage of the fact that source and
destination cannot overlap.  Thus, you can expand the loads and stores more
freely (see mips_block_move_straight).  The type-based aliasing code does not
retain this information so as soon as you are finished expanding RTL you have
practically frozen the order of these instructions.

I agree with you that you should not be required to provide this pattern but I
don't think there is better way currently.  We would need to subset the
original alias set similarly to restrict but since memcpy switches the alias
set of its operands to zero this approach would not work.

Adam


Re: DFA Scheduler - unable to pipeline loads

2007-08-31 Thread Adam Nemet
"Matt Lee" <[EMAIL PROTECTED]> writes:

> I am seeing poor scheduling in Dhrystone where a memcpy call is
> expanded inline.
>
> memcpy (&dst, &src, 16) ==>
>
> load  1, rA + 4
> store 1, rB + 4
> load  2, rA + 8
> store 2, rB + 8
> ...

Are you sure that there are no dependencies due to aliasing here.  The
only similar thing that Dhrystone has to what you quote is between a
pointer and a global variable and in fact there is an aliasing
conflict there.

If that is the case you can define a movmem pattern where you first
load everthing in one chunk and then store it later.  See MIPS's
movmemsi pattern and the function mips_block_move_straight.

Adam


Re: ICE building libgcc2.c for MIPS, too

2007-07-14 Thread Adam Nemet
Sandra Loosemore <[EMAIL PROTECTED]> writes:
> #6  0x0875dc03 in rest_of_handle_combine ()
> at /scratch/sandra/mips-mainline/src/gcc-mainline/gcc/combine.c:1264
>
> ...
>
> Looks like a job for valgrind?  But I'm out of time for working on
> this, at least for now.  Can anyone else take a stab at it?

I am seeing the same thing.  The issues is basically that after the
no_new_pseudoes change mips_split_symbol through mips_force_temporary
will create new pseudos during combine.  The regnumber of these regs
will address out of the reg_stat array in combine.

This is incorrect as in combine you are supposed to use the scratch
register that combine provides through a clobber clause which AFAIK is
always an existing pseudo.  I am planning to fix this by changing
mips_split_symbol to use the scratch register instead of a new pseudo
when invoked from a splitter.  I will try to get the patch tested
overnight.

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-18 Thread Adam Nemet
Richard Kenner writes:
> Otherwise, if you have
> 
>   struct foo {int a: 32; int b: 32; };
>   struct bar {int c: 32; int d: 32; };
> 
> you have the fields A and C conflicting, which is wrong.

Well, that is where structure-field aliasing comes in.  The two cannot
even alias for addressable fields:

struct s 
{ 
  int a; 
  int b; 
}; 
 
struct t 
{ 
  int a; 
  int b; 
}; 
 
struct s *a; 
struct t *b; 
 
main () 
{ 
  a->a = 1; 
  b->a = 2; 
  return a->a; 
} 

movqa(%rip), %rax
movl$1, (%rax)
movqb(%rip), %rax
movl$2, (%rax)
movl$1, %eax
ret

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-18 Thread Adam Nemet
Richard Kenner writes:
> > I am glad to see we are converging toward implementation issues now!
> > 
> > I am storing it in a new field under the alias_set_entry:
> > 
> >   get_alias_set_entry (TYPE_ALIAS_SET (t))->nonaddr_alias_set.
> 
> Where T is which type? 

Type of the expression passed to get_alias_set.  And without the
component_uses_parent_alias_set loop.

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-18 Thread Adam Nemet
Richard Kenner writes:
> But there's also an implementation issue: where do you propose to *store*
> this fake alias set?  There is no such field as DECL_ALIAS_SET.

I am glad to see we are converging toward implementation issues now!

I am storing it in a new field under the alias_set_entry:

  get_alias_set_entry (TYPE_ALIAS_SET (t))->nonaddr_alias_set.

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-17 Thread Adam Nemet
Eric Botcazou writes:
> For the Ada testcase:
> 
> ;; s->i = 0
> (insn 7 6 0 p.adb:5 (set (mem/s/j:SI (reg/v/f:DI 59 [ s ]) [4 .i+0 
> S4 A32])
> (const_int 0 [0x0])) -1 (nil))
> 
> ;; *p = 1
> (insn 8 7 0 p.adb:6 (set (mem:SI (reg/v/f:DI 60 [ p ]) [2 S4 A32])
> (const_int 1 [0x1])) -1 (nil))
> 
> ;; return s->i
> (insn 9 8 10 p.adb:6 (set (reg:SI 62)
> (mem/s/j:SI (reg/v/f:DI 59 [ s ]) [4 .i+0 S4 A32])) -1 
> (nil))
> 
> i.e. s->i is accessed with the alias set of 'S'.

Thanks, that helped.  I think you're right.  Obviously we don't have
this issue with bitfields in C.

I am trying now to prototype a new approach along the lines of
returning true in component_uses_parent_alias_set for SFT's with
DECL_NONADDRESSABLE_P.

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-15 Thread Adam Nemet
Daniel Berlin writes:
> On 6/15/07, Adam Nemet <[EMAIL PROTECTED]> wrote:
> > get_alias_set and internally record_component_aliases makes
> > assumptions about the IR that are only valid in RTL.
> 
> What is this assumption, exactly?

That non-addressable fields are always accessed through alias set 0.
For example:

  X: (set (mem A) ...)
  Y: (set (zero_extract (mem B) 2 2) ...)

Let's say that A and B both point to the same type and have the same
alias set 1.  The bitfield store in Y would normally have the alias
set of the field, let's say 2.  Normally 2 would have to be recorded
as an alias subset of 1.  This is not done in RTL and the comment
before record_component_aliases mentions this:

/* Record that component types of TYPE, if any, are part of that type for
   aliasing purposes.  For record types, we only record component types
   for fields that are marked addressable.

If I am not mistaken this is because the functions expanding RTL
ensure that the alias set of the mem operand in Y has alias set 0.  Or
to quote expmed.c:

  /* We may be accessing data outside the field, which means
 we can alias adjacent data.  */
  if (MEM_P (op0))
{
  op0 = shallow_copy_rtx (op0);
  set_mem_alias_set (op0, 0);
  set_mem_expr (op0, 0);
}

Adam


Incorrect bitfield aliasing with Tree SSA

2007-06-15 Thread Adam Nemet
get_alias_set and internally record_component_aliases makes
assumptions about the IR that are only valid in RTL.  As a consequence
the constant 1 is propagated into the function call in the example
below (I found this issue with 4.1 but it is also present on trunk):

  struct s
  {
long long a:12;
long long b:12;
long long c:40;
  };
  
  struct s s, *p = &s;
  
  f ()
  {
p->a = 1;
s.a = 2;
s.b = 3;
use (p->a, s.b);
  }

or with VOPS:

  # VUSE 
  p.0_1 = p;
  # SMT.7_9 = VDEF<- missing SFT.2 VDEF
  p.0_1->a = 1;
  # SFT.2_11 = VDEF 
  s.a = 2;
  # SFT.1_13 = VDEF 
  s.b = 3;
  # p_14 = VDEF 
  # SFT.1_15 = VDEF 
  # SFT.2_16 = VDEF 
  # SMT.7_17 = VDEF 
  use (1, 3) [tail call];
 

In RTL the expmed.c functions extract_bit_field and store_bit_field
change the alias set of a bit-extraction or bit-insertion expression
of a memory operand to 0 because the actual RTL generated will access
adjacent fields.  Therefore there is no need to represent alias-subset
relationship between non-addressable fields and their containing
structure and record_component_aliases will skip such fields.

First I was trying to remove this assumption even for RTL but that
increased memory consumption on the cc1 preprocessed files by 1%.

The other idea I had was to make this check dependent on the IR type.
The problem is that the function tree_register_cfg_hooks setting the
IR type is called only after gimplification whereas get_alias_set is
called during.  Would it be acceptable to call tree_register_cfg_hooks
earlier?  Hopefully, this will be a minimal change and would make it
easy to backport it to 4.1 and 4.2.

Another option would be to have a Tree-SSA-specific version of
get_alias_set and change all the invocations in the tree-* modules.

Do people have other recommendations or a preference?

Adam


Re: running bprob.exp tests in a cross-testing environment

2007-01-02 Thread Adam Nemet
Ben Elliston <[EMAIL PROTECTED]> writes:
> I see a couple of solutions, but would like to discuss them here before
> working on a patch:
>
> 1. have the bprob.exp test driver create the appropriate directory
>tree on the target (and remove it when finished); or
>
> 2. set GCOV_PREFIX when running the test case so that the home
>directory on the target is prefixed.  The test harness would
>need to also prefix the .gcda filename when fetching the data
>file from the target.

If it is not too late I'd prefer the latter.  If I understand the
problem correctly the former would still fail if the test user is not
privileged enough to recreate the directory structure under /.

Thanks for working on this.

Adam


Re: MIPS: comparison modes in conditional branches

2005-12-06 Thread Adam Nemet
Jim Wilson writes:
> Yes, it looks like fixing the combiner problem would make it possible to 
> remove the mistaken mode checks.

Thanks very much, Jim.  I will work toward removing these then.

Adam


MIPS: comparison modes in conditional branches

2005-12-04 Thread Adam Nemet
Hi,

I am trying to understand why the MIPS backend's handling of
comparison modes seems to be different from what rtl.texi says:

  The mode of the comparison operation is independent of the mode
  of the data being compared.  If the comparison operation is being tested
  (e.g., the first operand of an @code{if_then_else}), the mode must be
  @code{VOIDmode}.

In the MIPS backend conditional branches based on integer comparisons
are valid only if the mode of the comparison matches the mode of the
operands, e.g. in 64-bit mode (if_then_else (eq:DI (reg:DI 1)
(const_int 0)) ...).

This is relevant because combine could optimize this:

  (set (reg:DI 2) (and:DI (reg:DI 1) (const_int 1)))
  (set (reg:SI 3) (truncate:SI (reg:DI 2)))
  (set (pc) (if_then_else (eq:SI (reg:SI 3) (const_int 0)) ...))

into this:

  (set (reg:DI 2) (and:DI (regDI 1) (const_int 1)))
  (set (pc) (if_then_else (eq:SI (reg:DI 2) (const_int 0)) ...))

I think this is a valid transformation on MIPS64 too.  In fact I
believe that as long as we compare valid SImode or DImode values the
64-bit comparison should work fine.

If I remove the comparison modes from the conditional branch patterns
regression testing on mips64-elf does not show any problem (diff is
attached for reference).  When I however compare the assembly output
it is clear that some of the changes are incorrect.  This is
a simplified testcase illustrating the problem:

  f (long long a)
  {
if ((a & 0x) != 0)
  abort ();
  }
  
  long long a = 0x12345678LL;
  main ()
  {
f (a);
  }

What happens is that combine.c:simplify_comparison  noticing the
zero-extension transforms this:

  (ne:DI (and:DI (reg:DI 4) (const_int 0x)) (const_int 0))

into this:

  (ne:DI (reg:SI 4) (const_int 0))

And now we do allow comparison like this in conditional branches.

On targets where DI->SI mode-switch is not a nop I think the
transformation in simplify_comparison is incorrect.  We need an
explicit truncate here to enforce valid SI values:

  (ne:DI (truncate:SI (reg:DI 4)) (const_int 0)).

Now if I am correct and this last thing is really a bug then the
obvious question is whether it has anything to do with the more
restrictive form for conditional branches on MIPS64?  And of course if
I fix it then whether it would be OK to lift the mode restrictions in
the conditional branch patterns.

Thanks,
Adam

Index: mips.md
===
--- mips.md (revision 107758)
+++ mips.md (working copy)
@@ -4291,7 +4291,7 @@
 (define_insn "*branch_zero"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "comparison_operator"
+(match_operator 0 "comparison_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (const_int 0)])
 (label_ref (match_operand 1 "" ""))
@@ -4311,7 +4311,7 @@
 (define_insn "*branch_zero_inverted"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "comparison_operator"
+(match_operator 0 "comparison_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (const_int 0)])
 (pc)
@@ -4333,7 +4333,7 @@
 (define_insn "*branch_equality"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "equality_operator"
+(match_operator 0 "equality_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (match_operand:GPR 3 "register_operand" "d")])
 (label_ref (match_operand 1 "" ""))
@@ -4353,7 +4353,7 @@
 (define_insn "*branch_equality_inverted"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "equality_operator"
+(match_operator 0 "equality_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (match_operand:GPR 3 "register_operand" "d")])
 (pc)