Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-13 Thread Richard Sandiford
Chung-Ju Wu jasonw...@gmail.com writes:
 +  /* If operands[1] is a large constant and cannot be performed
 + move behavior with single instruction, we need to split it.  */

Suggest ...cannot by performed by a single instruction...

 +  high20_rtx = GEN_INT ((INTVAL (operands[1])  12)  12);

Better to use gen_int_mode (..., SImode).  Although GEN_INT will be OK
if the compiler uses an arithmetic shift for , that isn't guaranteed
(although I doubt any modern compiler would use anything else).

 +  low12_rtx = GEN_INT (INTVAL (operands[1])  0xfff);
 +
 +  emit_move_insn (tmp_rtx, high20_rtx);
 +  emit_move_insn (operands[0], plus_constant (SImode,
 +   tmp_rtx,
 +   INTVAL (low12_rtx)));

There's no need to create an rtx and then use INTVAL on it.
low12 can just be a HOST_WIDE_INT.

Looks good to me with those changes, thanks.

Richard


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Richard Sandiford
Chung-Ju Wu jasonw...@gmail.com writes:
 On 10/2/13 1:31 AM, Richard Sandiford wrote:
 Chung-Ju Wu jasonw...@gmail.com writes:
 +  /* Use $r15, if the value is NOT in the range of Is20,
 + we must output sethi + ori directly since
 + we may already passed the split stage.  */
 +  return sethi\t%0, hi20(%1)\;ori\t%0, %0, lo12(%1);
 +case 17:
 +  return #;
 
 I don't really understand the comment for case 16.  Returning #
 (like for case 17) forces a split even at the output stage.
 
 In this case it might not be worth forcing a split though, so I don't
 see any need to change the code.  I think the comment should be changed
 to give a different reason though.
 

 Sorry for the misleading comment.

 For case 17, we were trying to split large constant into two individual
 rtx patterns into sethi + addi so that we can have chance to match
 addi pattern with 16-bit instruction.

 But case 16 is different.
 This case is only produced at prologue/epilogue phase, using a temporary
 register $r15 to hold a large constant for adjusting stack pointer. 
 Since prologue/epilogue is after split1/split2 phase, we can only
 output sethi + ori directly.
 (The addi instruction with $r15 is a 32-bit instruction.)

But this code is in the output template of the define_insn.  That code
is only executed during final, after all passes have been run.  If the
template returns #, final will split the instruction itself, which is
possible even at that late stage.  # doesn't have any effect on the
passes themselves.

(FWIW, there's also a split3 pass that runs after prologue/epilogue
generation but before sched2.)

However, ISTR there is/was a rule that prologue instructions shouldn't
be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
Maybe you hit an ICE because of that?

Another way to handle this would be to have the movsi expander split
large constant moves.  When can_create_pseudo_p (), the intermediate
results can be stored in new registers, otherwise they should reuse
operands[0].  Two advantages to doing it that way are that high parts
can be shared before RA, and that calls to emit_move_insn from the
prologue code will split the move automatically.  I think many ports
do it that way (including MIPS FWIW).

Thanks,
Richard


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Lin Tang
On 2013/10/6 05:57 PM, Richard Sandiford wrote:
  But case 16 is different.
  This case is only produced at prologue/epilogue phase, using a temporary
  register $r15 to hold a large constant for adjusting stack pointer. 
  Since prologue/epilogue is after split1/split2 phase, we can only
  output sethi + ori directly.
  (The addi instruction with $r15 is a 32-bit instruction.)
 But this code is in the output template of the define_insn.  That code
 is only executed during final, after all passes have been run.  If the
 template returns #, final will split the instruction itself, which is
 possible even at that late stage.  # doesn't have any effect on the
 passes themselves.
 
 (FWIW, there's also a split3 pass that runs after prologue/epilogue
 generation but before sched2.)
 
 However, ISTR there is/was a rule that prologue instructions shouldn't
 be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
 Maybe you hit an ICE because of that?
 
 Another way to handle this would be to have the movsi expander split
 large constant moves.  When can_create_pseudo_p (), the intermediate
 results can be stored in new registers, otherwise they should reuse
 operands[0].  Two advantages to doing it that way are that high parts
 can be shared before RA, and that calls to emit_move_insn from the
 prologue code will split the move automatically.  I think many ports
 do it that way (including MIPS FWIW).

FWIW, most ports usually just handle such large adjustment cases in
the prologue/epilogue code manually; either multiple SP-adjustments, or
use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
You might be able to get it to work, but trying to rely on the splitter
does not seem like best practice...

Chung-Lin



Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Richard Sandiford
Chung-Lin Tang clt...@codesourcery.com writes:
 On 2013/10/6 05:57 PM, Richard Sandiford wrote:
  But case 16 is different.
  This case is only produced at prologue/epilogue phase, using a temporary
  register $r15 to hold a large constant for adjusting stack pointer. 
  Since prologue/epilogue is after split1/split2 phase, we can only
  output sethi + ori directly.
  (The addi instruction with $r15 is a 32-bit instruction.)
 But this code is in the output template of the define_insn.  That code
 is only executed during final, after all passes have been run.  If the
 template returns #, final will split the instruction itself, which is
 possible even at that late stage.  # doesn't have any effect on the
 passes themselves.
 
 (FWIW, there's also a split3 pass that runs after prologue/epilogue
 generation but before sched2.)
 
 However, ISTR there is/was a rule that prologue instructions shouldn't
 be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
 Maybe you hit an ICE because of that?
 
 Another way to handle this would be to have the movsi expander split
 large constant moves.  When can_create_pseudo_p (), the intermediate
 results can be stored in new registers, otherwise they should reuse
 operands[0].  Two advantages to doing it that way are that high parts
 can be shared before RA, and that calls to emit_move_insn from the
 prologue code will split the move automatically.  I think many ports
 do it that way (including MIPS FWIW).

 FWIW, most ports usually just handle such large adjustment cases in
 the prologue/epilogue code manually; either multiple SP-adjustments, or
 use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
 You might be able to get it to work, but trying to rely on the splitter
 does not seem like best practice...

To be clear, I wasn't talking about relying on the splitter in the
define_split sense.  I was saying that the move expanders could
split large constants.

MIPS prologue code does use emit_move_insn to move large constants,
which automatically produces a split form from the outset.  I don't
really agree that it's bad practice.

Thanks,
Richard


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Ju Wu
2013/10/6 Richard Sandiford rdsandif...@googlemail.com:
 Chung-Ju Wu jasonw...@gmail.com writes:
 On 10/2/13 1:31 AM, Richard Sandiford wrote:
 Chung-Ju Wu jasonw...@gmail.com writes:
 +  /* Use $r15, if the value is NOT in the range of Is20,
 + we must output sethi + ori directly since
 + we may already passed the split stage.  */
 +  return sethi\t%0, hi20(%1)\;ori\t%0, %0, lo12(%1);
 +case 17:
 +  return #;

 I don't really understand the comment for case 16.  Returning #
 (like for case 17) forces a split even at the output stage.

 In this case it might not be worth forcing a split though, so I don't
 see any need to change the code.  I think the comment should be changed
 to give a different reason though.


 Sorry for the misleading comment.

 For case 17, we were trying to split large constant into two individual
 rtx patterns into sethi + addi so that we can have chance to match
 addi pattern with 16-bit instruction.

 But case 16 is different.
 This case is only produced at prologue/epilogue phase, using a temporary
 register $r15 to hold a large constant for adjusting stack pointer.
 Since prologue/epilogue is after split1/split2 phase, we can only
 output sethi + ori directly.
 (The addi instruction with $r15 is a 32-bit instruction.)

 But this code is in the output template of the define_insn.  That code
 is only executed during final, after all passes have been run.  If the
 template returns #, final will split the instruction itself, which is
 possible even at that late stage.  # doesn't have any effect on the
 passes themselves.

 (FWIW, there's also a split3 pass that runs after prologue/epilogue
 generation but before sched2.)

 However, ISTR there is/was a rule that prologue instructions shouldn't
 be split, since they'd lose their RTX_FRAME_RELATED_P bit or something.
 Maybe you hit an ICE because of that?


Ah... yes, you are right.  In the nds32_force_addi_stack_int(),
I move a large constant to a temp register for stack pointer adjustment:

+  /* $r15 is going to be temporary register to hold the value.  */
+  tmp_reg = gen_rtx_REG (SImode, TA_REGNUM);
+
+  /* Create one more instruction to move value
+ into the temporary register.  */
+  value_move_insn = emit_move_insn (tmp_reg, GEN_INT (full_value));
+
+  /* At prologue, we need to tell GCC that this is frame related insn,
+ so that we can consider this instruction to output debug information.
+ If full_value is NEGATIVE, it means this function
+ is invoked by expand_prologue.  */
+  if (full_value  0)
+RTX_FRAME_RELATED_P (value_move_insn) = 1;
+
+  /* Create new 'add' rtx.  */
+  sp_adjust_insn = gen_addsi3 (stack_pointer_rtx,
+   stack_pointer_rtx,
+   tmp_reg);
+  /* Emit rtx into insn list and receive its transformed insn rtx.  */
+  sp_adjust_insn = emit_insn (sp_adjust_insn);
+
+  /* At prologue, we need to tell GCC that this is frame related insn,
+ so that we can consider this instruction to output debug information.
+ If full_value is NEGATIVE, it means this function
+ is invoked by expand_prologue.  */
+  if (full_value  0)
+RTX_FRAME_RELATED_P (sp_adjust_insn) = 1;

If there is a rule to avoid spliting instructions with RTX_FRAME_RELATED_P,
I think it is the case why I hit an ICE of unrecognized insn for
'value_move_insn'.

It seems that my comment to case 16 is incorrect.
Thanks for clarifying it.

 Another way to handle this would be to have the movsi expander split
 large constant moves.  When can_create_pseudo_p (), the intermediate
 results can be stored in new registers, otherwise they should reuse
 operands[0].  Two advantages to doing it that way are that high parts
 can be shared before RA, and that calls to emit_move_insn from the
 prologue code will split the move automatically.  I think many ports
 do it that way (including MIPS FWIW).


Do you mean that I should split large constant by myself in movsi
(or starting from movsi) for both case 16 and case 17?

Thanks for the suggestion.  I'll try to implement it. :)


Best regards,
jasonwucj


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-06 Thread Chung-Ju Wu
2013/10/6 Chung-Lin Tang clt...@codesourcery.com:
 On 2013/10/6 下午 06:33, Richard Sandiford wrote:
 Chung-Lin Tang clt...@codesourcery.com writes:
 On 2013/10/6 05:57 PM, Richard Sandiford wrote:
 Another way to handle this would be to have the movsi expander split
 large constant moves.  When can_create_pseudo_p (), the intermediate
 results can be stored in new registers, otherwise they should reuse
 operands[0].  Two advantages to doing it that way are that high parts
 can be shared before RA, and that calls to emit_move_insn from the
 prologue code will split the move automatically.  I think many ports
 do it that way (including MIPS FWIW).

 FWIW, most ports usually just handle such large adjustment cases in
 the prologue/epilogue code manually; either multiple SP-adjustments, or
 use of a temp register (better control of RTX_FRAME_RELATED_P anyways).
 You might be able to get it to work, but trying to rely on the splitter
 does not seem like best practice...

 To be clear, I wasn't talking about relying on the splitter in the
 define_split sense.  I was saying that the move expanders could
 split large constants.

 MIPS prologue code does use emit_move_insn to move large constants,
 which automatically produces a split form from the outset.  I don't
 really agree that it's bad practice.

 I think that's mostly the same as what I meant by manually; it seems
 that there's lots of MIPS backend machinery starting from
 mips_legitimize_move(), so it's not really automatic ;)

 Chung-Lin


Hi, Chung-Lin,

Thanks for the hint. ^_^

I will follow Richard and your suggestion to split large constant
via movsi manually.  So that it will be automatically split whenever
emit_move_insn() is used. :)


Best regards,
jasonwucj


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-10-01 Thread Richard Sandiford
Chung-Ju Wu jasonw...@gmail.com writes:
 +  /* Use $r15, if the value is NOT in the range of Is20,
 + we must output sethi + ori directly since
 + we may already passed the split stage.  */
 +  return sethi\t%0, hi20(%1)\;ori\t%0, %0, lo12(%1);
 +case 17:
 +  return #;

I don't really understand the comment for case 16.  Returning #
(like for case 17) forces a split even at the output stage.

In this case it might not be worth forcing a split though, so I don't
see any need to change the code.  I think the comment should be changed
to give a different reason though.

 +   /* Note that (le:SI X INT_MAX) is not the same as (lt:SI X INT_MIN).
 +  We better have an assert here in case GCC does not properly
 +  optimize it away.  */
 +   gcc_assert (code != LE || INTVAL (operands[2]) != INT_MAX);

Sorry, I was being lazy when I said INT_MAX.  I really meant INT_MAX on
the target (assuming SImode == int), whereas INT_MAX here is a host thing.
0x7fff would be OK.

 +  /* Create RbRe_str string.
 + Note that we need to output ',' character if there exists En4 field.  */
 +  if (REGNO (operands[0]) != SP_REGNUM  REGNO (operands[1]) != SP_REGNUM)
 +  RbRe_str = INTVAL (operands[2]) != 0 ? %0, %1,  : %0, %1;
 +  else
 +  RbRe_str = ;

The RbRe_str = lines should only be indented by 2 extra spaces, not 4.
Same for pop.

Looks good otherwise, thanks.

Richard


Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-09-14 Thread Richard Sandiford
Some comments for part 2.

Chung-Ju Wu jasonw...@gmail.com writes:
 +;; Include intrinsic functions definition.
 +(include nds32.intrinsic.md)
 +
 +;; Include block move for nds32 multiple load/store behavior.
 +(include nds32.multiple.md)
 +
 +;; Include DImode/DFmode operations.
 +(include nds32.doubleword.md)
 +
 +;; Include peephole patterns.
 +(include nds32.peephole2.md)

Usual gcc style is to use - rather than . as a word separator in
filenames.

 +(define_insn *store_si
 +  [(set (match_operand:SI 0 memory_operand   =U45, U33, U37, U45, m)
 + (match_operand:SI 1 register_operandl,   l,   l,   d, r))]
 +  

Loads, stores, register moves and constant moves should normally be in
the same pattern, so that anything operating on constraints can see all
the alternatives at once.  This might not be as important for LRA as it
was for reload, but it still seems like good practice.

 +(define_insn *movmode
 +  [(set (match_operand:QIHISI 0 register_operand =r, m, r)
 + (match_operand:QIHISI 1 register_operand  r, r, m))]
 +  
 +{
 +  switch (which_alternative)
 +{
 +case 0:
 +  if (get_attr_length (insn) == 2)
 + return mov55\t%0, %1;
 +  else
 + return ori\t%0, %1, 0;
 +case 1:
 +  return nds32_output_32bit_store (operands, byte);
 +case 2:
 +  return nds32_output_32bit_load (operands, byte);
 +
 +default:
 +  gcc_unreachable ();
 +}
 +}
 +  [(set_attr type alu,store,load)
 +   (set_attr enabled 1)
 +   (set_attr_alternative length
 + [
 +   ;; Alternative 0
 +   (if_then_else (match_test TARGET_16_BIT)
 +  (const_int 2)
 +  (const_int 4))
 +   ;; Alternative 1
 +   (const_int 4)
 +   ;; Alternative 2
 +   (const_int 4)
 + ])])

The style used in the load and store patterns was:

(define_insn *movmode
  [(set (match_operand:QIHISI 0 register_operand =r, r, m, r)
(match_operand:QIHISI 1 register_operand  r, r, r, m))]
  
{
  switch (which_alternative)
{
case 0:
  return mov55\t%0, %1;
case 1:
  return ori\t%0, %1, 0;
case 2:
  return nds32_output_32bit_store (operands, byte);
case 3:
  return nds32_output_32bit_load (operands, byte);
default:
  gcc_unreachable ();
}
}
  [(set_attr type alu,alu,store,load)
   (set_attr length 2,4,4,4)])

which seems neater.  Did you try that but find that it didn't work here?

Same comment for other instructions where:

   (if_then_else (match_test TARGET_16_BIT)
 (const_int 2)
 (const_int 4))

occurs (except for the special case of relaxable branch instructions,
where using the if_then_else is good).

 +;; We use nds32_symbolic_operand to limit that only 
 CONST/SYMBOL_REF/LABEL_REF
 +;; are able to match such instruction template.
 +(define_insn *move_addr
 +  [(set (match_operand:SI 0 register_operand   =l, r)
 + (match_operand:SI 1 nds32_symbolic_operand  i, i))]
 +  
 +  la\t%0, %1
 +  [(set_attr type move)
 +   (set_attr length  8)])
 +
 +
 +(define_insn *sethi
 +  [(set (match_operand:SI 0 register_operand   =r)
 + (high:SI (match_operand:SI 1 immediate_operand  i)))]
 +  
 +{
 +  return sethi\t%0, hi20(%1);
 +}
 +  [(set_attr type alu)
 +   (set_attr length 4)])
 +
 +
 +(define_insn *lo_sum
 +  [(set (match_operand:SI 0 register_operand =r)
 + (lo_sum:SI (match_operand:SI 1 register_operand   0)
 +(match_operand:SI 2 immediate_operand  i)))]
 +  
 +  ori\t%0, %1, lo12(%2)
 +  [(set_attr type alu)
 +   (set_attr length 4)])

Very minor, but nds32_symbolic_operand seems like a better choice for
*sethi and *lo_sum too, since (high ...) and (lo_sum ...) shouldn't be
used for const_ints.

Any pass would be in its rights to fuse a *sethi and *lo_sum pair back
into a single *move_addr.  Is that something you want to allow?
(That's a genuine question rather than a review comment btw.)

Is the 0 constraint on the *lo_sum really necessary?  It looks from
the later OR patterns as though this form of ORI allows the source and
destination registers to be different.

 +;; Zero extension instructions.
 +
 +(define_expand zero_extendmodesi2
 +  [(set (match_operand:SI 0 general_operand )
 + (zero_extend:SI (match_operand:QIHI 1 general_operand )))]
 +  
 +{
 +  rtx tmp_reg;
 +
 +  /* We need to make sure operands[1] is a register.  */
 +  if (!REG_P (operands[1]))
 +operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);

Why do you need this?  It looks like the architecture has zero-extending loads.

 +
 +  /* If the pattern is (mem X) - (zero_extend (reg Y)),
 + we create two rtx patterns:
 +   (reg:SI K) - (zero_extend:SI (reg Y))
 +   (mem:SI X) - (reg:SI K)
 + The first rtx will be matched by '*zero_extendmodesi2_reg' template,
 + and the second rtx will be matched by mov naming pattern.  */
 +  if (MEM_P (operands[0]))
 +{
 +  tmp_reg = gen_reg_rtx (SImode);
 +

Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

2013-07-09 Thread Joseph S. Myers
On Mon, 8 Jul 2013, Chung-Ju Wu wrote:

 +/* This is used to identify used ISA when doing code generation.
 +   Initialize it with macro TARGET_DEFAULT_ISA,
 +   which is defined in nds32-isa-xxx.h file.
 +   User can specify it by using '-misa=X' option.  */
 +enum nds32_isa nds32_selected_isa = TARGET_DEFAULT_ISA;

Rather than using global state, put this in the gcc_options structure 
using a Variable entry in your .opt file.

 +   warning (0, For the option -misr-vector-size=X, the valid X 
 +   must be: 4 or 16);

The diagnostics in this function should all not start with a capital 
letter.  Invalid arguments to an option should be errors, not warnings.  
Since you have a location passed to this function, use error_at rather 
than the legacy functions that implicitly use input_location.

 +case OPT_misa_:
 +  /* Check valid ISA: v2 v3 v3m.  */
 +  if (strcmp (arg, v2) == 0)

Use Enum in the .opt file and get all the conversion from strings to 
integer values, and errors for unknown values, done automatically.

-- 
Joseph S. Myers
jos...@codesourcery.com