Re: Combine or peephole?

2010-04-21 Thread Frank Isamov
 On Mon, Apr 19, 2010 at 5:54 PM, Jeff Law l...@redhat.com wrote:

 combine requires a data dependency, so for this situation, combine isn't
 going to help.  The easy solution is to create a peephole.    You can also
 create a machine dependent reorg pass to detect more of these opportunities.
 Jeff



 Hi Jeff, et al,

 Thank you for your reply. Two more questions:

 1. Is it possible to add a machine dependent reorg pass at backend
 level without changing the standard infrastructure? If so, can you
 please point me such example? If no, may the new plugin architecture
 help here?
 2. A peephole for such case just repeats instruction definition
 pattern. As all information already available for such peephole,
 wouldn’t it be useful to implement the pass to be a part of the
 standard infrastructure?

 Thank you,
 Frank


Re: Combine or peephole?

2010-04-21 Thread Frank Isamov
Hi Ian,

On Wed, Apr 21, 2010 at 5:42 PM, Ian Lance Taylor i...@google.com wrote:
 Frank Isamov frank.isa...@gmail.com writes:

  2. A peephole for such case just repeats instruction definition
  pattern. As all information already available for such peephole,
  wouldn’t it be useful to implement the pass to be a part of the
  standard infrastructure?

 See define_peephole and define_peephole2.  If that doesn't answer your
 question, can you rephrase it?

 Ian


I think I understood the points from Jeff’s reply and I am going to
look at PA implementation now. Just for this email thread
completeness, I’ll try to rephrase the initial question:

Instructions which manipulate with data in parallel and have no data
dependency automatically require peephole2 definition or/and machine
dependent reorg pass. (Please see an example at the bottom of this
email). Peephole2 pattern, in this case, just repeats instruction’s
RTL pattern.
As such instructions can appear in SIMD architectures, I just thought
that it would be profitable to have this pass to be a part of the
common infrastructure.


(define_insn assi6
 [(parallel [
(set (match_operand:SI 0 register_operand =r)
 (minus:SI (match_operand:SI 1 register_operand r)
   (match_operand:SI 2 register_operand r)))
(set (match_operand:SI 3 register_operand =r)
 (plus:SI (match_operand:SI 4 register_operand r)
  (match_operand:SI 5 register_operand r)))
 ])]
 
 as\t%5, %4, %3, %2, %1, %0 %!
)

(define_peephole2
 [
(set (match_operand:SI 0 register_operand)
 (minus:SI (match_operand:SI 1 register_operand)
   (match_operand:SI 2 register_operand)))
(set (match_operand:SI 3 register_operand)
 (plus:SI (match_operand:SI 4 register_operand)
  (match_operand:SI 5 register_operand)))
 ]
 
 [(parallel [
(set (match_dup 0)
 (minus:SI (match_dup 1)
   (match_dup 2)))
(set (match_dup 3)
 (plus:SI (match_dup 4)
  (match_dup 5)))
 ])]
 
)


Combine or peephole?

2010-04-19 Thread Frank Isamov
Hi,

My architecture supports instructions with two parallel side effects.
For example, addition and subtraction can be done in parallel:

(define_insn assi6
  [(parallel [
 (set (match_operand:SI 0 register_operand =r)
  (minus:SI (match_operand:SI 1 register_operand r)
(match_operand:SI 2 register_operand r)))
 (set (match_operand:SI 3 register_operand =r)
  (plus:SI (match_operand:SI 4 register_operand r)
   (match_operand:SI 5 register_operand r)))
  ])]
  
  as\t%5, %4, %3, %2, %1, %0 %!
)

This instruction is not chosen at ‘combine’ time even if ‘plus’ and
‘minus’ instructions are located one after other. That it is,
probably, there is no data dependency between them.

In attempt to resolve the problem, I am providing a peephole optimization:

(define_peephole2
  [
 (set (match_operand:SI 0 register_operand)
  (minus:SI (match_operand:SI 1 register_operand)
(match_operand:SI 2 register_operand)))
 (set (match_operand:SI 3 register_operand)
  (plus:SI (match_operand:SI 4 register_operand)
   (match_operand:SI 5 register_operand)))
  ]
  
  [(parallel [
 (set (match_dup 0)
  (minus:SI (match_dup 1)
(match_dup 2)))
 (set (match_dup 3)
  (plus:SI (match_dup 4)
   (match_dup 5)))
  ])]
  
)

This works for some cases, but I wanted to ask experts whether this is
the way to go. Repeating the same pattern for a peephole might be not
the best way to resolve the problem.

Did you observe something similar? What would be the best way to
resolve such situation?

Thank you,
Frank


Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Frank Isamov
Hi,

In my backend, I have a problem with the pass which determines the
best register class for a virtual register (Pass 0 for finding allocno
costs).

In all insns in this example both R_REGS and D_REGS register classes
are applicable (but all registers in an insn should be from the same
register class).

This is asmcons output:

;; Function mul (mul)
(note 1 0 5 NOTE_INSN_DELETED)

(note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 2 5 3 2 NOTE_INSN_DELETED)

(insn 3 2 4 2 a.c:2 (set (reg/v:SI 97 [ b ])
(reg:SI 49 r1 [ b ])) 43 {movsi_regs} (expr_list:REG_DEAD
(reg:SI 49 r1 [ b ])
(nil)))

(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)

(insn 7 4 9 2 a.c:5 (set (reg/v:SI 94 [ __a_11 ])
(plus:SI (reg:SI 48 r0 [ a ])
(reg/v:SI 97 [ b ]))) 0 {*addsi3_1} (expr_list:REG_DEAD
(reg:SI 48 r0 [ a ])
(nil)))

(insn 9 7 10 2 a.c:5 (parallel [
(set (reg:SI 100)
(ashift:SI (reg/v:SI 97 [ b ])
(const_int 3 [0x3])))
(clobber (reg:CC 88 cc))
]) 36 {ashlsi3} (expr_list:REG_UNUSED (reg:CC 88 cc)
(expr_list:REG_EQUAL (ashift:SI (reg/v:SI 97 [ b ])
(const_int 3 [0x3]))
(nil

(insn 10 9 11 2 a.c:5 (set (reg:SI 101)
(plus:SI (reg:SI 100)
(reg/v:SI 97 [ b ]))) 0 {*addsi3_1} (expr_list:REG_DEAD (reg:SI 100)
(expr_list:REG_DEAD (reg/v:SI 97 [ b ])
(expr_list:REG_EQUAL (mult:SI (reg/v:SI 97 [ b ])
(const_int 9 [0x9]))
(nil)

(insn 11 10 12 2 a.c:5 (set (reg:SI 102)
(plus:SI (reg:SI 101)
(reg/v:SI 94 [ __a_11 ]))) 0 {*addsi3_1}
(expr_list:REG_DEAD (reg:SI 101)
(expr_list:REG_DEAD (reg/v:SI 94 [ __a_11 ])
(nil

(note 12 11 17 2 NOTE_INSN_DELETED)

(insn 17 12 23 2 a.c:7 (parallel [
(set (reg/i:SI 48 r0)
(ashift:SI (reg:SI 102)
(const_int 10 [0xa])))
(clobber (reg:CC 88 cc))
]) 36 {ashlsi3} (expr_list:REG_UNUSED (reg:CC 88 cc)
(expr_list:REG_DEAD (reg:SI 102)
(nil

(insn 23 17 0 2 a.c:7 (use (reg/i:SI 48 r0)) -1 (nil))

The problem I see is that for registers 100,101 I get best register
class D instead of R – actually they get the same cost and D is chosen
(maybe because it is first).

But they should not get the same cost since choosing D_REGS causes two
additional copies.

To my understanding the algorithm checks every insn, and since
register 100 appears only in insns in which all registers are still
virtual, any register class fits without additional cost. But later
when coloring it with d register, we need copies from r to d and back
to r.

Can someone please help with this?

Is there any reading material about this part of the IRA?

Thanks, Frank.


Re: Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Frank Isamov
-- Forwarded message --
From: Frank Isamov frank.isa...@gmail.com
Date: Thu, Mar 18, 2010 at 4:28 PM
Subject: Re: Coloring problem - Pass 0 for finding allocno costs
To: Ian Bolton bol...@icerasemi.com


On Thu, Mar 18, 2010 at 3:51 PM, Ian Bolton bol...@icerasemi.com wrote:
 The problem I see is that for registers 100,101 I get best register
 class D instead of R - actually they get the same cost and D is chosen
 (maybe because it is first).

 Hi Frank.

 Do D and R overlap?  It would be useful to know which regs are in
 which class, before trying to understand what is going on.

 Can you paste an example of your define_insn from your MD file to show
 how operands from D or R are both valid?  I ask this because it is
 possible to express that D is more expensive than R with operand
 constraints.

 For general IRA info, you might like to look over my long thread on
 here called Understanding IRA.

 Cheers,
 Ian


Hi Ian,

Thank you very much for your prompt reply.
D and R are not overlap. Please see fragments of .md and .h files below:

From the md:

(define_register_constraint a R_REGS )
(define_register_constraint d D_REGS )

(define_predicate a_operand
   (match_operand 0 register_operand)
{
       unsigned int regno;
       if (GET_CODE (op) == SUBREG)
           op = SUBREG_REG (op);
       regno = REGNO (op);
       return (regno = FIRST_PSEUDO_REGISTER ||
REGNO_REG_CLASS(regno) == R_REGS);
}
)

(define_predicate d_operand
   (match_operand 0 register_operand)
{
       unsigned int regno;
       if (GET_CODE (op) == SUBREG)
           op = SUBREG_REG (op);
       regno = REGNO (op);
       return (regno = FIRST_PSEUDO_REGISTER ||
REGNO_REG_CLASS(regno) == D_REGS);
}
)

(define_predicate a_d_operand
 (ior (match_operand 0 a_operand)
      (match_operand 0 d_operand)))

(define_predicate i_a_d_operand
 (ior (match_operand 0 immediate_operand)
      (match_operand 0 a_d_operand)))

(define_insn movmode_regs
 [(set (match_operand:SISFM 0 a_d_operand =a, a, a, d, d, d)
               (match_operand:SISFM 1 i_a_d_operand   a, i, d, a, i, d))]
 
 move\t%1, %0
)

(define_insn *addsi3_1
 [(set (match_operand:SI 0 a_d_operand    =a, a,  a,d,d)
           (plus:SI (match_operand:SI 1 a_d_operand %a, a,  a,d,d)
                           (match_operand:SI 2 nonmemory_operand
U05,S16,a,d,U05)))]
 
 adda\t%2, %1, %0
)

;;  Arithmetic Left and Right Shift Instructions
(define_insn shPatmode3
 [(set (match_operand:SCIM 0 register_operand =a,d,d)
           (sh_oprnd:SCIM (match_operand:SCIM 1 register_operand a,d,d)
                                   (match_operand:SI 2
nonmemory_operand U05,d,U05)))
  (clobber (reg:CC CC_REGNUM))]
 
 shIsa\t%2, %1, %0
)

From the h file:

#define REG_CLASS_CONTENTS                                              \
 {
            \
   {0x, 0x, 0x}, /* NO_REGS*/          \
   {0x, 0x, 0x}, /* D_REGS*/          \
   {0x, 0x, 0x}, /* R_REGS*/           \

ABI requires use of R registers for arguments and return value. Other
than that all of these instructions are more or less symmetrical in
sense of using D or R. So, an optimal choice would be use of R for
this example. And if D register is chosen, it involves additional copy
from R to D and back to R.

Thank you, Frank


Advancing SP on a call

2010-03-10 Thread Frank Isamov
We have a problem with arguments passing in memory.

The caller puts the arguments in memory relative to the sp:
add sp, 4 // allocate space for the argument. stack grows up
store r1, (sp-4)  // store  the argument on the stack
call xxx// call the function.

In xxx the result code looks like:
load (sp-4), r1   // load the argument from the stack.

The problem is that the 'call' instruction pushes the return address
to the stack and
increments the sp by 4 so when the callee tries to access the memory
it does not get
to the correct location.

How can I tell GCC that that the callee should load from the original
offset + 4?

Thanks.


Re: How to make 'long int' type be a PDImode?

2010-03-08 Thread Frank Isamov
On Mon, Mar 8, 2010 at 8:29 AM, Joern Rennecke
joern.renne...@embecosm.com wrote:
 Quoting Frank Isamov frank.isa...@gmail.com:

 Hi,

 I'd like to make a backend which would have 48 bits for 'long' type.
 (32 for int and 64 for long long).

 I have tried to define:
 #define LONG_TYPE_SIZE  48

 That's not a partial integer mode; PDImode would have the same size as
 DImode,
 just not all bits would be significant.

 and one of:
 INT_MODE (PDI, 6);

 And that wouldn't be PDImode, more like THImode (three-halves integer mode,
 going by the precedent of TQFmode - three-quarter float mode - of the 1750a
 port in GCC prior to version 3.1)



I am sorry, I still can conclude how to make the backend use a certain
mode for 'long' type.
My architecture implements 32- and 48- bit registers and instructions
operating on these registers. That it why my intention is to use 'int'
for 32 and 'long' for 48 bit operations. My attempts lead to the
backend ignore 48 bits path inspite of the LONG_TYPE_SIZE definition
and 'long' is still processed as 32 bit value. I think I am missing
something, so I am applying for help. Any piece of information would
be useful.

Thank you,
Frank


Re: How to make 'long int' type be a PDImode?

2010-03-08 Thread Frank Isamov
On Mon, Mar 8, 2010 at 4:27 PM, Frank Isamov frank.isa...@gmail.com wrote:
 On Mon, Mar 8, 2010 at 8:29 AM, Joern Rennecke
 joern.renne...@embecosm.com wrote:
 Quoting Frank Isamov frank.isa...@gmail.com:

 Hi,

 I'd like to make a backend which would have 48 bits for 'long' type.
 (32 for int and 64 for long long).

 I have tried to define:
 #define LONG_TYPE_SIZE  48

 That's not a partial integer mode; PDImode would have the same size as
 DImode,
 just not all bits would be significant.

 and one of:
 INT_MODE (PDI, 6);

 And that wouldn't be PDImode, more like THImode (three-halves integer mode,
 going by the precedent of TQFmode - three-quarter float mode - of the 1750a
 port in GCC prior to version 3.1)



 I am sorry, I still can conclude how to make the backend use a certain
 mode for 'long' type.
 My architecture implements 32- and 48- bit registers and instructions
 operating on these registers. That it why my intention is to use 'int'
 for 32 and 'long' for 48 bit operations. My attempts lead to the
 backend ignore 48 bits path inspite of the LONG_TYPE_SIZE definition
 and 'long' is still processed as 32 bit value. I think I am missing
 something, so I am applying for help. Any piece of information would
 be useful.

 Thank you,
 Frank


Correction: I still can conclude  should be read as I still cannot conclude


How to make 'long int' type be a PDImode?

2010-03-07 Thread Frank Isamov
Hi,

I'd like to make a backend which would have 48 bits for 'long' type.
(32 for int and 64 for long long).

I have tried to define:
#define LONG_TYPE_SIZE  48

and one of:
INT_MODE (PDI, 6);
PARTIAL_INT_MODE (DI);

Unfortunately, trying to compile a program, I see that the backend
still uses SImode for 'long'.

I could not find an example for target doing similar work.
Could you please advise or show the location of implementation I can refer?

Thank you,
Frank