Re: [PATCH v4 01/10] Initial TI PRU GCC port

2018-09-24 Thread Dimitar Dimitrov
On Monday, 9/24/2018 11:38:23 EEST Richard Sandiford wrote:
> Dimitar Dimitrov  writes:
> > On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
> >> Dimitar Dimitrov  writes:
> >> > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
> >> > compliance.  */ +static tree
> >> > +check_op_callback (tree *tp, int *walk_subtrees, void *data)
> >> > +{
> >> > +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> >> > +
> >> > +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) ==
> >> > ENUMERAL_TYPE)
> >> > +{
> >> > +  /* Forward declarations have NULL tree type.  Skip them.  */
> >> > +  if (TREE_TYPE (*tp) == NULL)
> >> > +return NULL;
> >> > +}
> >> > +
> >> > +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around? 
> >> > */
> >> > +  if (TREE_TYPE (*tp) == NULL)
> >> > +return NULL;
> >> > +
> >> > +  const tree type = TREE_TYPE (*tp);
> >> > +
> >> > +  /* Direct function calls are allowed, obviously.  */
> >> > +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) ==
> >> > POINTER_TYPE)
> >> > +{
> >> > +  const tree ptype = TREE_TYPE (type);
> >> > +  if (TREE_CODE (ptype) == FUNCTION_TYPE)
> >> > +return NULL;
> >> > +}
> >> 
> >> This seems like a bit of a dangerous heuristic.  Couldn't it also cause
> >> 
> >> us to skip things like:
> >>(void *) func
> >> 
> >> ?  (OK, that's dubious C, but we do support it.)
> > 
> > The cast yields a "data pointer", which is 32 bits for both types of ABI.
> > Hence it is safe to skip "(void *) func".
> > 
> > The TI ABI's 16 bit function pointers become a problem when they change
> > the
> > layout of structures and function argument registers.
> 
> OK.  The reason this stood out is that the code doesn't obviously match
> the comment.  If the code is just trying to skip direct function calls,
> I think the gcall sequence I mentioned would be more obvious, and would
> match the existing comment.  If anything that takes the address of a
> function is OK then it might be worth changing the comment to include that.
I will use your suggested gcall snippet since it is safe and obvious. The 
comment matches my original intent.


> >> > +/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
> >> > +static reg_class_t
> >> > +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t
> >> > regclass)
> >> > +{
> >> > +  return regclass == NO_REGS ? GENERAL_REGS : regclass;
> >> > +}
> >> 
> >> This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
> >> original class rather than add new registers.  Can you remember why
> >> it was needed?
> > 
> > I'm not sure what target code is supposed to return when NO_REGS is passed
> > here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So
> > I
> > followed suite.
> > 
> > Here is a backtrace of LRA when NO_REGS is passed:
> > 0xf629ae pru_preferred_reload_class
> > 
> > ../../gcc/gcc/config/pru/pru.c:788
> > 
> > 0xa3d6e8 process_alt_operands
> > 
> > ../../gcc/gcc/lra-constraints.c:2600
> > 
> > 0xa3ef7d curr_insn_transform
> > 
> > ../../gcc/gcc/lra-constraints.c:3889
> > 
> > 0xa4301e lra_constraints(bool)
> > 
> > ../../gcc/gcc/lra-constraints.c:4906
> > 
> > 0xa2726c lra(_IO_FILE*)
> > 
> > ../../gcc/gcc/lra.c:2446
> > 
> > 0x9c97a9 do_reload
> > 
> > ../../gcc/gcc/ira.c:5469
> > 
> > 0x9c97a9 execute
> > 
> > ../../gcc/gcc/ira.c:5653
> 
> I think it should just pass NO_REGS through unmodified (which means
> spill to memory).  In some ways it would be good if it didn't have to
> handle this case, but again that's going to be work to fix.
> 
> The RA will pass ALL_REGS if it can handle any register, and wants to
> know what the target would prefer.  But for any input, the hook needs
> to stick within the registers it was given.

Thanks for the clarification. I will remove the PRU hook and will rely on the 
default implementation (i.e. return the given rclass).

> >> > +; LRA cannot cope with clobbered op2, hence the scratch register.
> >> > +(define_insn "ashr3"
> >> > +  [(set (match_operand:QISI 0 "register_operand""=,r")
> >> > +  (ashiftrt:QISI
> >> > +(match_operand:QISI 1 "register_operand""0,r")
> >> > +(match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
> >> > +   (clobber (match_scratch:QISI 3   "=r,X"))]
> >> > +  ""
> >> > +  "@
> >> > +   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\;
> >> > lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 *
> >> > 8) - 1\; jmp ASHRLP%=\;ASHREND%=: +   lsr\\t%0, %1, 1\;qbbc LSIGN%=,
> >> > %0,
> >> > (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" +  [(set_attr
> >> > "type"
> >> > "complex,alu")
> >> > +   (set_attr "length" "28,4")])
> >> 
> >> What do you mean by LRA not coping?  What did you try originally and
> >> what went wrong?
> > 
> > Better assembly could 

Re: [PATCH v4 01/10] Initial TI PRU GCC port

2018-09-24 Thread Jeff Law
On 9/24/18 4:38 AM, Richard Sandiford wrote:
> Dimitar Dimitrov  writes:
>> On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
>>> Dimitar Dimitrov  writes:
 +(define_insn
 "sub_impl_>>> _zext_op2>" +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
 +  (minus:EQD
 +   (zero_extend:EQD
 +(match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
 +   (zero_extend:EQD
 +(match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"]
 +  ""
 +  "@
 +   sub\\t%0, %1, %2
 +   sub\\t%0, %1, %2
 +   rsb\\t%0, %2, %1"
 +  [(set_attr "type" "alu")
 +   (set_attr "length" "4")])
>>>
>>> By convention, subtraction patterns shouldn't accept constants for
>>> operand 2.  Constants should instead be subtracted using an addition
>>> of the negative.
>> Understood. I will remove second alternative. But I will leave the third one 
>> since it enables an important optimization:
>>
>>unsigned test(unsigned a)
>>{
>> return 10-a;
>>}
>>
>> RTL:
>> (insn 6 3 7 2 (set (reg:SI 152)
>> (minus:SI (const_int 10 [0xa])
>> (reg/v:SI 151 [ a ]))) "test.c":430 -1
>>  (nil))
>>
>> Assembly:
>> test:
>> rsb r14, r14, 10
>> ret
> 
> Thanks.  And yeah, the final alternative is fine.
> 
 +;; Return true if OP is a text segment reference.
 +;; This is needed for program memory address expressions.  Borrowed from
 AVR. +(define_predicate "text_segment_operand"
 +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
 +{
 +  switch (GET_CODE (op))
 +{
 +case CODE_LABEL:
 +  return true;
 +case LABEL_REF :
 +  return true;
 +case SYMBOL_REF :
 +  return SYMBOL_REF_FUNCTION_P (op);
 +case PLUS :
 +case MINUS :
 +  /* Assume canonical format of symbol + constant.
 +   Fall through.  */
 +case CONST :
 +  return text_segment_operand (XEXP (op, 0), VOIDmode);
 +default :
 +  return false;
 +}
 +})
>>>
>>> This probably comes from AVR, but: no spaces before ":".
>>>
>>> Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
>>> Do you know if that triggers in practice, and if so where?
>> Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here.
>>
>>> An IMO neater and slightly more robust way of writing the body is:
>>>   poly_int64 offset:
>>>   rtx base = strip_offset (op, );
>>>   switch (GET_CODE (base))
>>>   
>>> {
>>> 
>>> case LABEL_REF:
>>>   ...as above...
>>> 
>>> case SYMBOL_REF:
>>>   ...as above...
>>> 
>>> default:
>>>   return false;
>>> 
>>> }
>>>
>>> with "plus" and "minus" not in the match_code list (since they should
>>> always appear in consts if they really are text references).
>>
>> The "plus" and "minus" are needed when handling code labels as values. Take 
>> for example the following construct:
>>
>>int x = & - &
>> lab1:
>>   ...
>> lab2:
>>
>>
>> My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In 
>> the 
>> above case it is passed the following RTL expression:
>> (minus:SI (label_ref/v:SI 20)
>> (label_ref/v:SI 27))
>>
>> I need to detect text labels so that I annotate them with %pmem:
>> .4byte   %pmem(.L4-(.L2))
>> Instead of the incorrect:
>> .4byte   .L3-(.L2)
> 
> OK, thanks for the explanation.  I think the target-independent code should
> really be passing (const (minus ...)) rather than a plain (minus ...) here,
> but that's going to be difficult to change at this stage.  And like you say,
> the split_offset suggestion wouldn't have handled this anyway.
> 
> So yeah, please keep what you have now.
> 
 +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
 compliance.  */ +static tree
 +check_op_callback (tree *tp, int *walk_subtrees, void *data)
 +{
 +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
 +
 +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
 +{
 +  /* Forward declarations have NULL tree type.  Skip them.  */
 +  if (TREE_TYPE (*tp) == NULL)
 +  return NULL;
 +}
 +
 +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
 +  if (TREE_TYPE (*tp) == NULL)
 +return NULL;
 +
 +  const tree type = TREE_TYPE (*tp);
 +
 +  /* Direct function calls are allowed, obviously.  */
 +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
 +{
 +  const tree ptype = TREE_TYPE (type);
 +  if (TREE_CODE (ptype) == FUNCTION_TYPE)
 +  return NULL;
 +}
>>>
>>> This seems like a bit of a dangerous heuristic.  Couldn't it also cause
>>> us to skip things like:
>>>
>>>(void *) func
>>>
>>> ?  (OK, that's dubious C, but we do support it.)
>> The cast yields a "data 

Re: [PATCH v4 01/10] Initial TI PRU GCC port

2018-09-24 Thread Richard Sandiford
Dimitar Dimitrov  writes:
> On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
>> Dimitar Dimitrov  writes:
>> > +(define_insn
>> > "sub_impl_> > _zext_op2>" +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
>> > +  (minus:EQD
>> > +   (zero_extend:EQD
>> > +(match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
>> > +   (zero_extend:EQD
>> > +(match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"]
>> > +  ""
>> > +  "@
>> > +   sub\\t%0, %1, %2
>> > +   sub\\t%0, %1, %2
>> > +   rsb\\t%0, %2, %1"
>> > +  [(set_attr "type" "alu")
>> > +   (set_attr "length" "4")])
>> 
>> By convention, subtraction patterns shouldn't accept constants for
>> operand 2.  Constants should instead be subtracted using an addition
>> of the negative.
> Understood. I will remove second alternative. But I will leave the third one 
> since it enables an important optimization:
>
>unsigned test(unsigned a)
>{
> return 10-a;
>}
>
> RTL:
> (insn 6 3 7 2 (set (reg:SI 152)
> (minus:SI (const_int 10 [0xa])
> (reg/v:SI 151 [ a ]))) "test.c":430 -1
>  (nil))
>
> Assembly:
> test:
> rsb r14, r14, 10
> ret

Thanks.  And yeah, the final alternative is fine.

>> > +;; Return true if OP is a text segment reference.
>> > +;; This is needed for program memory address expressions.  Borrowed from
>> > AVR. +(define_predicate "text_segment_operand"
>> > +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
>> > +{
>> > +  switch (GET_CODE (op))
>> > +{
>> > +case CODE_LABEL:
>> > +  return true;
>> > +case LABEL_REF :
>> > +  return true;
>> > +case SYMBOL_REF :
>> > +  return SYMBOL_REF_FUNCTION_P (op);
>> > +case PLUS :
>> > +case MINUS :
>> > +  /* Assume canonical format of symbol + constant.
>> > +   Fall through.  */
>> > +case CONST :
>> > +  return text_segment_operand (XEXP (op, 0), VOIDmode);
>> > +default :
>> > +  return false;
>> > +}
>> > +})
>> 
>> This probably comes from AVR, but: no spaces before ":".
>> 
>> Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
>> Do you know if that triggers in practice, and if so where?
> Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here.
>
>> An IMO neater and slightly more robust way of writing the body is:
>>   poly_int64 offset:
>>   rtx base = strip_offset (op, );
>>   switch (GET_CODE (base))
>>   
>> {
>> 
>> case LABEL_REF:
>>   ...as above...
>> 
>> case SYMBOL_REF:
>>   ...as above...
>> 
>> default:
>>   return false;
>> 
>> }
>> 
>> with "plus" and "minus" not in the match_code list (since they should
>> always appear in consts if they really are text references).
>
> The "plus" and "minus" are needed when handling code labels as values. Take 
> for example the following construct:
>
>int x = & - &
> lab1:
>   ...
> lab2:
>
>
> My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In 
> the 
> above case it is passed the following RTL expression:
> (minus:SI (label_ref/v:SI 20)
> (label_ref/v:SI 27))
>
> I need to detect text labels so that I annotate them with %pmem:
> .4byte%pmem(.L4-(.L2))
> Instead of the incorrect:
> .4byte   .L3-(.L2)

OK, thanks for the explanation.  I think the target-independent code should
really be passing (const (minus ...)) rather than a plain (minus ...) here,
but that's going to be difficult to change at this stage.  And like you say,
the split_offset suggestion wouldn't have handled this anyway.

So yeah, please keep what you have now.

>> > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
>> > compliance.  */ +static tree
>> > +check_op_callback (tree *tp, int *walk_subtrees, void *data)
>> > +{
>> > +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
>> > +
>> > +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
>> > +{
>> > +  /* Forward declarations have NULL tree type.  Skip them.  */
>> > +  if (TREE_TYPE (*tp) == NULL)
>> > +  return NULL;
>> > +}
>> > +
>> > +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
>> > +  if (TREE_TYPE (*tp) == NULL)
>> > +return NULL;
>> > +
>> > +  const tree type = TREE_TYPE (*tp);
>> > +
>> > +  /* Direct function calls are allowed, obviously.  */
>> > +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
>> > +{
>> > +  const tree ptype = TREE_TYPE (type);
>> > +  if (TREE_CODE (ptype) == FUNCTION_TYPE)
>> > +  return NULL;
>> > +}
>> 
>> This seems like a bit of a dangerous heuristic.  Couldn't it also cause
>> us to skip things like:
>> 
>>(void *) func
>> 
>> ?  (OK, that's dubious C, but we do support it.)
> The cast yields a "data pointer", which is 32 bits for both types of ABI. 
> Hence it is safe to skip "(void *) func".
>
> The TI ABI's 16 bit function pointers become a 

Re: [PATCH v4 01/10] Initial TI PRU GCC port

2018-09-22 Thread Dimitar Dimitrov
On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
> Dimitar Dimitrov  writes:
> > +(define_insn
> > "sub_impl_ > _zext_op2>" +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
> > +   (minus:EQD
> > +(zero_extend:EQD
> > + (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
> > +(zero_extend:EQD
> > + (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"]
> > +  ""
> > +  "@
> > +   sub\\t%0, %1, %2
> > +   sub\\t%0, %1, %2
> > +   rsb\\t%0, %2, %1"
> > +  [(set_attr "type" "alu")
> > +   (set_attr "length" "4")])
> 
> By convention, subtraction patterns shouldn't accept constants for
> operand 2.  Constants should instead be subtracted using an addition
> of the negative.
Understood. I will remove second alternative. But I will leave the third one 
since it enables an important optimization:

   unsigned test(unsigned a)
   {
return 10-a;
   }

RTL:
(insn 6 3 7 2 (set (reg:SI 152)
(minus:SI (const_int 10 [0xa])
(reg/v:SI 151 [ a ]))) "test.c":430 -1
 (nil))

Assembly:
test:
rsb r14, r14, 10
ret

> 
> > +(define_constraint "I"
> > +  "An unsigned 8-bit constant."
> > +  (and (match_code "const_int")
> > +   (match_test "UBYTE_INT (ival)")))
> 
> As it stands this will reject QImode constants with the top bit set,
> since const_ints are always stored in sign-extended form.  E.g. QImode
> 128 is stored as (const_int -128) rather than (const_int 128).
> 
> Unfortunately this is difficult to fix in a clean way, since
> const_ints don't store their mode (a long-standing wart) and unlike
> predicates, constraints aren't given a mode from context.  The best
> way I can think of coping with it is:
> 
> a) have a separate constraint for -128...127
> b) add a define_mode_attr that maps QI to the new constraint and
>HI and SI to I
> c) use  etc. instead of I in the match_operands
> 
> Similar comment for "J" and HImode, although you already have the
> "N" as the corresponding signed constraint and so don't need a new one.
Thank you. This strategy worked for QImode. I will include the changes in my 
next patch.

Since PRU ALU operations do not have 16-bit constant operands, there is no 
need to add define_mode_attr for HImode. The "mov" pattern already handles the 
"N" [-32768, 32767] constraint.

> 
> 
> > +;; Return true if OP is a text segment reference.
> > +;; This is needed for program memory address expressions.  Borrowed from
> > AVR. +(define_predicate "text_segment_operand"
> > +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
> > +{
> > +  switch (GET_CODE (op))
> > +{
> > +case CODE_LABEL:
> > +  return true;
> > +case LABEL_REF :
> > +  return true;
> > +case SYMBOL_REF :
> > +  return SYMBOL_REF_FUNCTION_P (op);
> > +case PLUS :
> > +case MINUS :
> > +  /* Assume canonical format of symbol + constant.
> > +Fall through.  */
> > +case CONST :
> > +  return text_segment_operand (XEXP (op, 0), VOIDmode);
> > +default :
> > +  return false;
> > +}
> > +})
> 
> This probably comes from AVR, but: no spaces before ":".
> 
> Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
> Do you know if that triggers in practice, and if so where?
Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here.

> An IMO neater and slightly more robust way of writing the body is:
>   poly_int64 offset:
>   rtx base = strip_offset (op, );
>   switch (GET_CODE (base))
>   
> {
> 
> case LABEL_REF:
>   ...as above...
> 
> case SYMBOL_REF:
>   ...as above...
> 
> default:
>   return false;
> 
> }
> 
> with "plus" and "minus" not in the match_code list (since they should
> always appear in consts if they really are text references).

The "plus" and "minus" are needed when handling code labels as values. Take 
for example the following construct:

   int x = & - &
lab1:
  ...
lab2:


My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In the 
above case it is passed the following RTL expression:
(minus:SI (label_ref/v:SI 20)
(label_ref/v:SI 27))

I need to detect text labels so that I annotate them with %pmem:
.4byte  %pmem(.L4-(.L2))
Instead of the incorrect:
.4byte   .L3-(.L2)


> > +;; Return true if OP is a load multiple operation.  It is known to be a
> > +;; PARALLEL and the first section will be tested.
> > +
> > +(define_special_predicate "load_multiple_operation"
> > +  (match_code "parallel")
> > +{
> > +  machine_mode elt_mode;
> > +  int count = XVECLEN (op, 0);
> > +  unsigned int dest_regno;
> > +  rtx src_addr;
> > +  int i, off;
> > +
> > +  /* Perform a quick check so we don't blow up below.  */
> > +  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
> > +  || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != REG
> > +  || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != MEM)
> > +return false;
> > +
> > +  dest_regno = 

Re: [PATCH v4 01/10] Initial TI PRU GCC port

2018-09-13 Thread Richard Sandiford
Dimitar Dimitrov  writes:
> +; Specialized IOR/AND patterns for matching setbit/clearbit instructions.
> +;
> +; TODO - allow clrbit and setbit to support (1 << REG) constructs
> +
> +(define_insn "clearbit__"
> +  [(set (match_operand:EQD 0 "register_operand"  "=r")

Nit: stray tab instead of space.

> + (and:EQD
> +   (zero_extend:EQD
> + (match_operand:EQS0 1 "register_operand" "r"))
> +   (match_operand:EQD 2 "single_zero_operand" "n")))]
> +  ""
> +  "clr\\t%0, %1, %V2"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "4")])

Very minor suggestion, doesn't need to hold up acceptance, but: some
patterns (like this one) have an explicit length of 4, but others have
an implicit length of 4 thanks to the default value in the define_attr.
It would be more consistent to do one or the other everywhere.

Using EQD and EQS0 like this has the unfortunate effect of creating
patterns like:

   ... (zero_extend:SI (match_operand:SI ...)) ...

(which you rely on for the define_substs) and:

   ... (zero_extend:HI (match_operand:SI ...)) ...

even though these are both invalid rtl.  That said, the rtl passes
should never generate this kind of rtl (i.e. they shouldn't rely on
targets to reject it), so that's probably fine in practice.  It just
adds a bit of bloat.  I also don't have a good suggestion how to fix it
without more infrastructure.  Adding:

  GET_MODE_SIZE (mode) >= GET_MODE_SIZE (mode)

should cause gencondmd to remove the cases in which the zero_extend
result is narrower than the input, but doing that everywhere would
be ugly, and would still leave the case in which the zero_extend
result is the same size as the input (which you can't remove without
breaking the define_substs).

So after all that, I think this is fine as-is.

> +(define_insn 
> "sub_impl_"
> +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
> + (minus:EQD
> +  (zero_extend:EQD
> +   (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
> +  (zero_extend:EQD
> +   (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"]
> +  ""
> +  "@
> +   sub\\t%0, %1, %2
> +   sub\\t%0, %1, %2
> +   rsb\\t%0, %2, %1"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "4")])

By convention, subtraction patterns shouldn't accept constants for
operand 2.  Constants should instead be subtracted using an addition
of the negative.

> +(define_constraint "I"
> +  "An unsigned 8-bit constant."
> +  (and (match_code "const_int")
> +   (match_test "UBYTE_INT (ival)")))

As it stands this will reject QImode constants with the top bit set,
since const_ints are always stored in sign-extended form.  E.g. QImode
128 is stored as (const_int -128) rather than (const_int 128).

Unfortunately this is difficult to fix in a clean way, since
const_ints don't store their mode (a long-standing wart) and unlike
predicates, constraints aren't given a mode from context.  The best
way I can think of coping with it is:

a) have a separate constraint for -128...127
b) add a define_mode_attr that maps QI to the new constraint and
   HI and SI to I
c) use  etc. instead of I in the match_operands

Similar comment for "J" and HImode, although you already have the
"N" as the corresponding signed constraint and so don't need a new one.

> +(define_predicate "const_1_operand"
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE (INTVAL (op), 1, 1)")))

INTVAL (op) == 1 seems more obvious.

> +(define_predicate "const_ubyte_operand"
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE (INTVAL (op), 0, 0xff)")))
> +
> +(define_predicate "const_uhword_operand"
> +  (and (match_code "const_int")
> +   (match_test "IN_RANGE (INTVAL (op), 0, 0x)")))

Here you can use "INTVAL (op) & GET_MODE_MASK (mode)" to avoid the
problem above, as long as you always pass a mode to these predicates.

> +(define_predicate "ctable_addr_operand"
> +  (and (match_code "const_int")
> +   (match_test ("(pru_get_ctable_base_index (INTVAL (op)) >= 0)"
> +
> +(define_predicate "ctable_base_operand"
> +  (and (match_code "const_int")
> +   (match_test ("(pru_get_ctable_exact_base_index (INTVAL (op)) >= 
> 0)"

Redundant brackets around the match_test strings.

(I want to rip out that syntax at some point, since allowing brackets
around strings just makes the files harder for scripting tools to parse.)

> +;; Return true if OP is a text segment reference.
> +;; This is needed for program memory address expressions.  Borrowed from AVR.
> +(define_predicate "text_segment_operand"
> +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
> +{
> +  switch (GET_CODE (op))
> +{
> +case CODE_LABEL:
> +  return true;
> +case LABEL_REF :
> +  return true;
> +case SYMBOL_REF :
> +  return SYMBOL_REF_FUNCTION_P (op);
> +case PLUS :
> +case MINUS :
> +  /* Assume canonical format of symbol + constant.
> +  Fall through.  */
> +case CONST :
> +