[PATCH] correct max alignment check in symtab.c

2018-10-22 Thread Paul Koning
GCC was hitting an ICE on some pdp11 tests due to a typo in a max alignment 
check.

Committed as obvious.

paul

ChangeLog:

2018-10-22  Paul Koning  

* symtab.c (symtab_node::increase_alignment): Correct max
alignment check.

Index: symtab.c
===
--- symtab.c(revision 265403)
+++ symtab.c(working copy)
@@ -2205,7 +2205,7 @@ increase_alignment_1 (symtab_node *n, void *v)
 void
 symtab_node::increase_alignment (unsigned int align)
 {
-  gcc_assert (can_increase_alignment_p () && align < MAX_OFILE_ALIGNMENT);
+  gcc_assert (can_increase_alignment_p () && align <= MAX_OFILE_ALIGNMENT);
   ultimate_alias_target()->call_for_symbol_and_aliases (increase_alignment_1,
(void *)(size_t) align,
true);



Re: [PATCH] add udivhi3, umodhi3 functions to libgcc

2018-10-18 Thread Paul Koning



> On Oct 18, 2018, at 1:18 PM, Jeff Law  wrote:
> 
> On 10/17/18 5:48 PM, Paul Koning wrote:
>> This is a revision of a patch I proposed a while back, to add udivhi3 and 
>> umodhi3 functions to libgcc since some platforms (like pdp11) need it.  The 
>> code is adopted from that of udivsi3.
>> 
>> In earlier discussion it was pointed out that internal functions need to 
>> start with __.  The code I had copied does not do that, so I corrected mine 
>> and also changed the existing code to conform to the rules.
>> 
>> Ok for trunk?
>> 
>>  paul
>> 
>> ChangeLog:
>> 
>> 2018-10-17  Paul Koning  
>> 
>>  * udivmodsi4.c (__udivmodsi4): Rename to conform to coding
>>  standard.
>>  * udivmod.c: Update references to __udivmodsi4.
>>  * udivhi3.c: New file.
>>  * udivmodhi4.c: New file.
>>  * config/pdp11/t-pdp11 (LIB2ADD): Add the new files.
> I think you need to fix divmod.c as well since it calls udivmodsi4.  OK
> with that fixed.
> 
> Jeff

Thanks.  Committed as shown below.

paul

ChangeLog:

2018-10-18  Paul Koning  

* udivmodsi4.c (__udivmodsi4): Rename to conform to coding
standard.
* divmod.c: Update references to __udivmodsi4.
* udivmod.c: Ditto.
* udivhi3.c: New file.
* udivmodhi4.c: New file.
* config/pdp11/t-pdp11 (LIB2ADD): Add the new files.

Index: config/pdp11/t-pdp11
===
--- config/pdp11/t-pdp11(revision 265276)
+++ config/pdp11/t-pdp11(working copy)
@@ -1,5 +1,7 @@
 LIB2ADD = $(srcdir)/udivmod.c \
  $(srcdir)/udivmodsi4.c \
+ $(srcdir)/udivhi3.c \
+ $(srcdir)/udivmodhi4.c \
  $(srcdir)/memcmp.c \
  $(srcdir)/memcpy.c \
  $(srcdir)/memmove.c \
Index: divmod.c
===
--- divmod.c(revision 265276)
+++ divmod.c(working copy)
@@ -21,7 +21,8 @@ a copy of the GCC Runtime Library Exception along
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
-long udivmodsi4 ();
+extern unsigned long __udivmodsi4(unsigned long num, unsigned long den,
+ int 
modwanted);
 
 long
 __divsi3 (long a, long b)
@@ -41,7 +42,7 @@ __divsi3 (long a, long b)
   neg = !neg;
 }
 
-  res = udivmodsi4 (a, b, 0);
+  res = __udivmodsi4 (a, b, 0);
 
   if (neg)
 res = -res;
@@ -64,7 +65,7 @@ __modsi3 (long a, long b)
   if (b < 0)
 b = -b;
 
-  res = udivmodsi4 (a, b, 1);
+  res = __udivmodsi4 (a, b, 1);
 
   if (neg)
 res = -res;
Index: udivhi3.c
===
--- udivhi3.c   (nonexistent)
+++ udivhi3.c   (working copy)
@@ -0,0 +1,38 @@
+/* Copyright (C) 2000-2018 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+extern unsigned short __udivmodhi4(unsigned short num, unsigned short den,
+  int 
modwanted);
+
+unsigned short
+__udivhi3 (unsigned short a, unsigned short b)
+{
+  return __udivmodhi4 (a, b, 0);
+}
+
+unsigned short
+__umodhi3 (unsigned short a, unsigned short b)
+{
+  return __udivmodhi4 (a, b, 1);
+}
+
Index: udivmod.c
===
--- udivmod.c   (revision 265276)
+++ udivmod.c   (working copy)
@@ -21,17 +21,18 @@ a copy of the GCC Runtime Library Exception along
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
-long udivmodsi4 ();
+extern unsigned long __udivmodsi4(unsigned long num, unsigned long den,
+ int 
modwanted);
 
 long
 __udivsi3 (long a, long b)
 {
-  return udivmodsi4 (a,

Re: [PATCH] add udivhi3, umodhi3 functions to libgcc

2018-10-17 Thread Paul Koning
This is a revision of a patch I proposed a while back, to add udivhi3 and 
umodhi3 functions to libgcc since some platforms (like pdp11) need it.  The 
code is adopted from that of udivsi3.

In earlier discussion it was pointed out that internal functions need to start 
with __.  The code I had copied does not do that, so I corrected mine and also 
changed the existing code to conform to the rules.

Ok for trunk?

paul

ChangeLog:

2018-10-17  Paul Koning  

* udivmodsi4.c (__udivmodsi4): Rename to conform to coding
standard.
* udivmod.c: Update references to __udivmodsi4.
* udivhi3.c: New file.
* udivmodhi4.c: New file.
* config/pdp11/t-pdp11 (LIB2ADD): Add the new files.

Index: config/pdp11/t-pdp11
===
--- config/pdp11/t-pdp11(revision 265151)
+++ config/pdp11/t-pdp11(working copy)
@@ -1,5 +1,7 @@
 LIB2ADD = $(srcdir)/udivmod.c \
  $(srcdir)/udivmodsi4.c \
+ $(srcdir)/udivhi3.c \
+ $(srcdir)/udivmodhi4.c \
  $(srcdir)/memcmp.c \
  $(srcdir)/memcpy.c \
  $(srcdir)/memmove.c \
Index: udivhi3.c
===
--- udivhi3.c   (nonexistent)
+++ udivhi3.c   (working copy)
@@ -0,0 +1,38 @@
+/* Copyright (C) 2000-2018 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+extern unsigned short __udivmodhi4(unsigned short num, unsigned short den,
+  int 
modwanted);
+
+unsigned short
+__udivhi3 (unsigned short a, unsigned short b)
+{
+  return __udivmodhi4 (a, b, 0);
+}
+
+unsigned short
+__umodhi3 (unsigned short a, unsigned short b)
+{
+  return __udivmodhi4 (a, b, 1);
+}
+
Index: udivmod.c
===
--- udivmod.c   (revision 265151)
+++ udivmod.c   (working copy)
@@ -21,17 +21,18 @@ a copy of the GCC Runtime Library Exception along
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
-long udivmodsi4 ();
+extern unsigned long __udivmodsi4(unsigned long num, unsigned long den,
+ int 
modwanted);
 
 long
 __udivsi3 (long a, long b)
 {
-  return udivmodsi4 (a, b, 0);
+  return __udivmodsi4 (a, b, 0);
 }
 
 long
 __umodsi3 (long a, long b)
 {
-  return udivmodsi4 (a, b, 1);
+  return __udivmodsi4 (a, b, 1);
 }
 
Index: udivmodhi4.c
===
--- udivmodhi4.c(nonexistent)
+++ udivmodhi4.c(working copy)
@@ -0,0 +1,47 @@
+/* Copyright (C) 2000-2018 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+unsigned short
+__udivmodhi4(unsigned short num, unsigned short den, int modwanted)
+{
+  unsigned short bit = 1;
+  unsigned short res = 0;
+
+  while (den < num && bit && !(den & (1L<<31)))
+{
+  den <<=1;
+  bit <<=1;
+}
+  while (bit)
+{
+  if (num >= den)
+   {
+ num -= den;
+ res |= bit;
+   }
+  

[PATCH, pdp11] fix problem with doloop_end

2018-10-12 Thread Paul Koning
For some newlib sources, pdp11 doloop_end was creating problems because it was 
handed a QImode operand when it only wants HImode.  This patch cures that, and 
also adds addqi3 and subqi3 patterns to help with that case.

Committed.

paul

ChangeLog:

2018-10-12  Paul Koning  

* config/pdp11/pdp11.md (doloop_end): New expander.
(doloop_end_insn): renamed from "doloop_end".
(addqi3): New pattern.
(subqi3): New pattern.
* config/pdp11/predicates.md (incdec_operand): New predicate.

Index: config/pdp11/predicates.md
===
--- config/pdp11/predicates.md  (revision 265131)
+++ config/pdp11/predicates.md  (revision 265132)
@@ -30,6 +30,14 @@
   (and (match_code "const_int")
(match_test "(unsigned) INTVAL (op) < 4")))
 
+;; Accept integer arguments +1 and -1, for which add and sub can be
+;; done as inc or dec instructions.  This matches the rule for the
+;; L and M constraints.
+(define_predicate "incdec_operand"
+  (and (match_code "const_int")
+   (ior (match_test "INTVAL (op) == -1")
+   (match_test "INTVAL (op) == 1"
+
 ;; Accept anything general_operand accepts, except that registers must
 ;; be FPU registers.
 (define_predicate "float_operand"
Index: config/pdp11/pdp11.md
===
--- config/pdp11/pdp11.md   (revision 265131)
+++ config/pdp11/pdp11.md   (revision 265132)
@@ -251,9 +251,28 @@
 
 ;; sob instruction
 ;;
-;; Do a define_expand because some alternatives clobber CC.
+;; This expander has to check for mode match because the doloop pass
+;; in gcc that invokes it does not do so, i.e., it may attempt to apply
+;; this pattern even if the count operand is QI or SI mode.
+(define_expand "doloop_end"
+  [(parallel [(set (pc)
+  (if_then_else
+   (ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m")
+   (const_int 1))
+   (label_ref (match_operand 1 "" ""))
+   (pc)))
+ (set (match_dup 0)
+  (plus:HI (match_dup 0)
+(const_int -1)))])]
+  "TARGET_40_PLUS"
+  "{
+if (GET_MODE (operands[0]) != HImode)
+  FAIL;
+  }")
+
+;; Do a define_split because some alternatives clobber CC.
 ;; Some don't, but it isn't all that interesting to cover that case.
-(define_insn_and_split "doloop_end"
+(define_insn_and_split "doloop_end_insn"
   [(set (pc)
(if_then_else
 (ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m")
@@ -1067,6 +1086,35 @@
 }"
   [(set_attr "length" "2,4,4,6")])
 
+(define_insn_and_split "addqi3"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=rR,Q")
+   (plus:QI (match_operand:QI 1 "general_operand" "%0,0")
+(match_operand:QI 2 "incdec_operand" "LM,LM")))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0)
+  (plus:QI (match_dup 1) (match_dup 2)))
+ (clobber (reg:CC CC_REGNUM))])]
+  ""
+  [(set_attr "length" "2,4")])
+
+;; Inc/dec sets V if overflow from the operation
+(define_insn "*addqi3"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=rR,Q")
+   (plus:QI (match_operand:QI 1 "general_operand" "%0,0")
+(match_operand:QI 2 "incdec_operand" "LM,LM")))
+   (clobber (reg:CC CC_REGNUM))]
+  "reload_completed"
+  "*
+{
+  if (INTVAL(operands[2]) == 1)
+return \"incb\t%0\";
+  else
+return \"decb\t%0\";
+}"
+  [(set_attr "length" "2,4")])
+
 

 ;;- subtract instructions
 ;; we don't have to care for constant second 
@@ -1226,6 +1274,35 @@
 }"
   [(set_attr "length" "2,4,4,6")])
 
+(define_insn_and_split "subqi3"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=rR,Q")
+   (plus:QI (match_operand:QI 1 "general_operand" "%0,0")
+(match_operand:QI 2 "incdec_operand" "LM,LM")))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0)
+  (plus:QI (match_dup 1) (match_dup 2)))
+ (clobber (reg:CC CC_REGNUM))])]
+  ""
+  [(set_attr "length" "2,4")])
+
+;; Inc/dec sets V if overflow from the operation
+(define_insn "*subqi3"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=rR,Q")
+   (plus:QI (match_operand:QI 1 "general_operand" "%0,0")
+(match_operand:QI 2 "incdec_operand" "LM,LM")))
+   (clobber (reg:CC CC_REGNUM))]
+  "reload_completed"
+  "*
+{
+  if (INTVAL(operands[2]) == -1)
+return \"incb\t%0\";
+  else
+return \"decb\t%0\";
+}"
+  [(set_attr "length" "2,4")])
+
 - and instructions
 ;; Bit-and on the pdp (like on the VAX) is done with a clear-bits insn.
 



Re: [PATCH, doc] describe mode checking for doloop_end pattern

2018-10-12 Thread Paul Koning



> On Oct 11, 2018, at 11:01 PM, Jeff Law  wrote:
> 
> On 10/11/18 3:09 PM, Paul Koning wrote:
>> Updated with an additional item I just debugged.
>> 
>> Since the code that uses the doloop_end pattern does not check the operand 
>> mode as given in the pattern, the pattern itself may need to do this, and 
>> that was not documented.  In addition, if the doloop_end pattern is a 
>> define_expand, there must be a define_insn (or define_insn_and_split) 
>> matching the generated pattern.  I had a define_split instead, and the 
>> result was an ICE in loop optimization (loop2_done pass).
>> 
>> This patch adds that information.  It also updates the example to reflect 
>> this.
>> 
>> Ok for trunk?
>> 
>>  paul
>> 
>> ChangeLog:
>> 
>> 2018-10-11  Paul Koning  
>> 
>>  * doc/md.texi (doloop_end): Document that the pattern code may
>>  need to check operand mode.
> OK.
> jeff

Thanks.  Committed.

paul



[PATCH, doc] describe mode checking for doloop_end pattern

2018-10-11 Thread Paul Koning
Updated with an additional item I just debugged.

Since the code that uses the doloop_end pattern does not check the operand mode 
as given in the pattern, the pattern itself may need to do this, and that was 
not documented.  In addition, if the doloop_end pattern is a define_expand, 
there must be a define_insn (or define_insn_and_split) matching the generated 
pattern.  I had a define_split instead, and the result was an ICE in loop 
optimization (loop2_done pass).

This patch adds that information.  It also updates the example to reflect this.

Ok for trunk?

paul

ChangeLog:

2018-10-11  Paul Koning  

* doc/md.texi (doloop_end): Document that the pattern code may
need to check operand mode.

Index: md.texi
===
--- md.texi (revision 265042)
+++ md.texi (working copy)
@@ -7619,7 +7619,23 @@ simplified) from the PDP-11 target:
 
 @smallexample
 @group
-(define_insn "doloop_end"
+(define_expand "doloop_end"
+  [(parallel [(set (pc)
+   (if_then_else
+(ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m")
+(const_int 1))
+(label_ref (match_operand 1 "" ""))
+(pc)))
+  (set (match_dup 0)
+   (plus:HI (match_dup 0)
+ (const_int -1)))])]
+  ""
+  "@{
+if (GET_MODE (operands[0]) != HImode)
+  FAIL;
+  @}")
+
+(define_insn "doloop_end_insn"
   [(set (pc)
 (if_then_else
  (ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m")
@@ -7662,10 +7678,23 @@ will be non-negative.
 Since the @code{doloop_end} insn is a jump insn that also has an output,
 the reload pass does not handle the output operand.  Therefore, the
 constraint must allow for that operand to be in memory rather than a
-register.  In the example shown above, that is handled by using a loop
-instruction sequence that can handle memory operands when the memory
-alternative appears.
+register.  In the example shown above, that is handled (in the
+@code{doloop_end_nocc} pattern) by using a loop instruction sequence
+that can handle memory operands when the memory alternative appears.
 
+GCC does not check the mode of the loop register operand when generating
+the @code{doloop_end} pattern.  If the pattern is only valid for some
+modes but not others, the pattern should be a @code{define_expand}
+pattern that checks the operand mode in the preparation code, and issues
+@code{FAIL} if an unsupported mode is found.  The example above does
+this, since the machine instruction to be used only exists for
+@code{HImode}.
+
+If the @code{doloop_end} pattern is a @code{define_expand}, there must
+also be a @code{define_insn} or @code{define_insn_and_split} matching
+the generated pattern.  Otherwise, the compiler will fail during loop
+optimization.
+
 @end ifset
 @ifset INTERNALS
 @node Insn Canonicalizations



[PATCH, doc] describe mode checking for doloop_end pattern

2018-10-11 Thread Paul Koning
Since the code that uses the doloop_end pattern does not check the operand mode 
as given in the pattern, the pattern itself may need to do this, and that was 
not documented.  This patch adds that information.  It also updates the example 
to reflect this.

Ok for trunk?

paul

ChangeLog:

2018-10-11  Paul Koning  

* doc/md.texi (doloop_end): Document that the pattern code may
need to check operand mode.

Index: doc/md.texi
===
--- doc/md.texi (revision 265042)
+++ doc/md.texi (working copy)
@@ -7619,7 +7619,23 @@ simplified) from the PDP-11 target:
 
 @smallexample
 @group
-(define_insn "doloop_end"
+(define_expand "doloop_end"
+  [(parallel [(set (pc)
+   (if_then_else
+(ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m")
+(const_int 1))
+(label_ref (match_operand 1 "" ""))
+(pc)))
+  (set (match_dup 0)
+   (plus:HI (match_dup 0)
+ (const_int -1)))])]
+  "TARGET_40_PLUS"
+  "@{
+if (GET_MODE (operands[0]) != HImode)
+  FAIL;
+  @}")
+
+(define_insn "doloop_end_nocc"
   [(set (pc)
 (if_then_else
  (ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m")
@@ -7628,17 +7644,28 @@ simplified) from the PDP-11 target:
  (pc)))
(set (match_dup 0)
 (plus:HI (match_dup 0)
-  (const_int -1)))]
-  ""
-  
+  (const_int -1)))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_40_PLUS && reload_completed"
+  "*
   @{
+rtx lb[1];
+   
 if (which_alternative == 0)
-  return "sob %0,%l1";
+   return \"sob\t%0,%l1\";
+   
+/* emulate sob */
+lb[0] = gen_label_rtx ();
+output_asm_insn (\"dec\t%0\", operands);
+output_asm_insn (\"beq\t%l0\", lb);
+output_asm_insn (\"jmp\t%l1\", operands);
+
+output_asm_label (lb[0]);
+fputs (\":\\n\", asm_out_file);
+   
+return \"\";
+  @}")
 
-/* emulate sob */
-output_asm_insn ("dec %0", operands);
-return "bne %l1";
-  @})
 @end group
 @end smallexample
 
@@ -7662,10 +7689,18 @@ will be non-negative.
 Since the @code{doloop_end} insn is a jump insn that also has an output,
 the reload pass does not handle the output operand.  Therefore, the
 constraint must allow for that operand to be in memory rather than a
-register.  In the example shown above, that is handled by using a loop
-instruction sequence that can handle memory operands when the memory
-alternative appears.
+register.  In the example shown above, that is handled (in the
+@code{doloop_end_nocc} pattern) by using a loop instruction sequence
+that can handle memory operands when the memory alternative appears.
 
+GCC does not check the mode of the loop register operand when generating
+the @code{doloop_end} pattern.  If the pattern is only valid for some
+modes but not others, the pattern should be a @code{define_expand}
+pattern that checks the operand mode in the preparation code, and issues
+@code{FAIL} if an unsupported mode is found.  The example above does
+this, since the machine instruction to be used only exists for
+@code{HImode}.
+
 @end ifset
 @ifset INTERNALS
 @node Insn Canonicalizations



Re: doloop insn generated incorrectly (wrong mode)

2018-10-11 Thread Paul Koning



> On Oct 11, 2018, at 3:11 AM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Wed, Oct 10, 2018 at 08:55:12PM -0400, Paul Koning wrote:
> 
> [ snip ]
> 
>> ...
>> Why is this happening, and how can I fix it (short of removing the 
>> doloop_end pattern)?  I see a comment in loop-doloop.c about handling a FAIL 
>> of the pattern -- am I going to have to check that the wrong mode is being 
>> passed in and FAIL if so?
> 
> That is exactly what other targets do.  Maybe you can improve doloop so
> this isn't necessary anymore?  :-)

Maybe.  For now I'll take the expand and check approach.  And propose a doc 
patch to explain that this is needed, because it's an unusual situation. 

paul

doloop insn generated incorrectly (wrong mode)

2018-10-10 Thread Paul Koning
I have a doloop_end pattern in pdp11.md which looks like this:

(define_insn_and_split "doloop_end"
  [(set (pc)
(if_then_else
 (ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m")
 (const_int 1))
 (label_ref (match_operand 1 "" ""))
 (pc)))
   (set (match_dup 0)
(plus:HI (match_dup 0)
 (const_int -1)))]
  "TARGET_40_PLUS"
  "#"
...

And the loop2 dump file shows this insn sequence:

(insn 1417 1415 1418 282 (set (reg:HI 565)
(plus:HI (subreg:HI (reg/v:QI 269 [ infcount ]) 0)
(const_int -1 [0x]))) 
"../../../../../combined/newlib/libc/stdio/vfscanf.c":1474:18 62 {addhi3}
 (nil))
(insn 1418 1417 1419 282 (set (reg/v:QI 292 [ infcount ])
(subreg:QI (reg:HI 565) 0)) 
"../../../../../combined/newlib/libc/stdio/vfscanf.c":1474:18 20 {movqi}
 (expr_list:REG_DEAD (reg:HI 565)
(nil)))
(jump_insn 1419 1418 1423 282 (set (pc)
(if_then_else (ne (reg/v:QI 269 [ infcount ])
(const_int 3 [0x3]))
(label_ref 1430)
(pc))) "../../../../../combined/newlib/libc/stdio/vfscanf.c":1474:9 
12 {cbranchqi4}
 (int_list:REG_BR_PROB 955630228 (nil))
 -> 1430)

which is turned into this (in loop2_doloop):

and ends up like this:

(jump_insn 2158 1447 2111 285 (parallel [
(set (pc)
(if_then_else (ne (reg:QI 647)
(const_int 1 [0x1]))
(label_ref 2111)
(pc)))
(set (reg:QI 647)
(plus:HI (reg:QI 647)
(const_int -1 [0x])))
]) "../../../../../combined/newlib/libc/stdio/vfscanf.c":1474:9 -1
 (int_list:REG_BR_PROB 955630228 (nil))
 -> 2111)

Note that this isn't permitted by the .md file -- the mode is wrong (QI not 
HI).  The result is an ICE in cfgrtl.c line 1275:

  /* If the substitution doesn't succeed, die.  This can happen
 if the back end emitted unrecognizable instructions or if
 target is exit block on some arches.  */
  if (!redirect_jump (as_a  (insn),
  block_label (new_bb), 0))
{
  gcc_assert (new_bb == EXIT_BLOCK_PTR_FOR_FN (cfun));
  return false;
}

It's not obvious to me how that relates to the wrong insn generated by the 
compiler, but clearly it can't be good to have something that won't match when 
it's time to generate the output code.  It seems to me that the doloop 
convertor should be checking the doloop_end pattern to make sure the mode 
matches, and not apply it unless it does.  

Why is this happening, and how can I fix it (short of removing the doloop_end 
pattern)?  I see a comment in loop-doloop.c about handling a FAIL of the 
pattern -- am I going to have to check that the wrong mode is being passed in 
and FAIL if so?

paul



Re: ICE building a libsupc++ file, pdp11 target

2018-10-09 Thread Paul Koning



> On Jul 17, 2018, at 9:36 AM, Richard Biener  
> wrote:
> 
> On Tue, Jul 17, 2018 at 3:08 PM Paul Koning  wrote:
>> 
>> 
>>> On Jul 17, 2018, at 5:46 AM, Richard Biener  
>>> wrote:
>>> 
>>>> ...
>>> 
>>> There is not enough information for anyone to help you without
>>> reproducing the issue which is maybe too much to ask for ;)
>>> 
>>> Can you debug_tree () the offending decl in gdb?
>> 
>> Yes, here it is.  I don't know anything about debugging in this area, so 
>> tools like debug_tree are good to learn about.  How would I interpret its 
>> output?
>> 
>> pkoning:gcc pkoning$ lldb ./cc1plus -- new_opa.ii -fno-implicit-templates 
>> -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
>> -fdiagnostics-show-location=once -frandom-seed=new_opa.lo -g -O2 -std=gnu++1z
>> (lldb) target create "./cc1plus"
>> Current executable set to './cc1plus' (x86_64).
>> ...
>> Process 10880 stopped
>> * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
>>frame #0: 0x000100c21378 cc1plus`internal_error(gmsgid="in %s, at 
>> %s:%d") at diagnostic.c:1441 [opt]
>>   1438 internal_error (const char *gmsgid, ...)
>>   1439 {
>>   1440   va_list ap;
>> -> 1441   va_start (ap, gmsgid);
>>   1442   rich_location richloc (line_table, input_location);
>>   1443   diagnostic_impl (, -1, gmsgid, , DK_ICE);
>>   1444   va_end (ap);
>> Target 0: (cc1plus) stopped.
>> (lldb) frame sel 2
>> frame #2: 0x000100074b36 
>> cc1plus`import_export_decl(decl=0x00014269c750) at decl2.c:2877 [opt]
>>   2874   gcc_assert (VAR_OR_FUNCTION_DECL_P (decl));
>>   2875   /* Any code that creates entities with TREE_PUBLIC cleared should
>>   2876  also set DECL_INTERFACE_KNOWN.  */
>> -> 2877   gcc_assert (TREE_PUBLIC (decl));
>>   2878   if (TREE_CODE (decl) == FUNCTION_DECL)
>>   2879 gcc_assert (DECL_IMPLICIT_INSTANTIATION (decl)
>>   2880 || DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION (decl)
>> (lldb) call debug_tree(decl)
>> >type >size 
>>unit-size 
>>align:8 warn_if_not_align:0 symtab:150 alias-set -1 canonical-type 
>> 0x1426aa5e8 precision:1 min  max > 0x142502a08 1>>
>>readonly constant used static tree_1 tree_2 tree_3 unsigned nonlocal 
>> in_system_header read decl_1 QI 
>> /Users/pkoning/Documents/svn/buildpdp/pdp11-aout/libstdc++-v3/include/type_traits:59:28
>>  size  unit-size 
>>align:8 warn_if_not_align:0 context > integral_constant> initial 
>>template-info 0x1426a64e0 chain >
>> (lldb)
> 
> lldb? eh ... ;)
> 
> anyhow, this is
> 
> namespace std
> {
> 
> # 56 
> "/Users/pkoning/Documents/svn/buildpdp/pdp11-aout/libstdc++-v3/include/type_traits"
> 3
>  template
>struct integral_constant
>{
>  static constexpr _Tp value = __v;
> ^^^
> 
> which should have TREE_PUBLIC set.  My next step would be to watch how
> this flag changes (if it does...)
> 
> break at ggc-page.c:1442 (the return stmt of ggc_internal_alloc)
> conditional on result == 0x14269c750
> and then watch *>base.public_flag printing said flag when
> the watchpoint hits
> (because you're watching the whole integer containing the bitfield bit).
> 
> If that doesn't go anywhere try reducing the source file using creduce
> or by other means.
> 
> Maybe look at reset_decl_linkage () and visibility support in general.

I trimmed the file a bit.

Managed to find where public_flag is cleared.  It is in cp/expr.c 
maybe_commonize_var, line 5619, here:

  else
{
  /* While for initialized variables, we must use internal
 linkage -- which means that multiple copies will not
 be merged.  */
  TREE_PUBLIC (decl) = 0;
  DECL_COMMON (decl) = 0;

Could it be related to the fact that I have an a.out (rather than ELF) target?

paul



Building with old gcc

2018-10-09 Thread Paul Koning
I'm trying to build the current code on Linux with GCC 4.3.2 (stock compiler in 
Fedora 10 which is my old test system).  It fails like this:

In file included from 
/mnt/hgfs/pkoning/Documents/svn/gcc/gcc/tree-data-ref.h:27,
 from 
/mnt/hgfs/pkoning/Documents/svn/gcc/gcc/gimple-loop-interchange.cc:44:
/mnt/hgfs/pkoning/Documents/svn/gcc/gcc/opt-problem.h: In constructor 
‘opt_result::opt_result(bool, opt_problem*)’:
/mnt/hgfs/pkoning/Documents/svn/gcc/gcc/opt-problem.h:217: error: class 
‘opt_result’ does not have any field named ‘opt_wrapper’
/mnt/hgfs/pkoning/Documents/svn/gcc/gcc/opt-problem.h:217: error: no matching 
function for call to ‘opt_wrapper::opt_wrapper()’
/mnt/hgfs/pkoning/Documents/svn/gcc/gcc/opt-problem.h:160: note: candidates 
are: opt_wrapper::opt_wrapper(T, opt_problem*) [with T = bool]
/mnt/hgfs/pkoning/Documents/svn/gcc/gcc/opt-problem.h:147: note:
 opt_wrapper::opt_wrapper(const opt_wrapper&)
make[3]: *** [gimple-loop-interchange.o] Error 1

Is 9.0 supposed to build with a build compiler this old?  The documentation on 
the GCC web page says an ISO C++ compiler is required but it doesn't tell me 
whether the C++ compiler in GCC 4.3.2 is adequate.  Since many people will be 
installing GCC using an older version of GCC, it would be good for the 
documentation to state what the minimum version of GCC is.

paul



Re: movmem pattern and missed alignment

2018-10-08 Thread Paul Koning



> On Oct 8, 2018, at 5:43 PM, Eric Botcazou  wrote:
> 
>> That's correct, I was explaining from the middle-end perspective.  There
>> we are consciously more lenient as we have to support the real world and
>> other languages than C.  This is one of the cases.
> 
> This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK 
> for every language on strict-alignment platforms.  This was changed only 
> because of SSE on x86.
> 
> -- 
> Eric Botcazou

So does that mean this should be a target-specific behavior, but it isn't at 
the moment?

paul



Re: movmem pattern and missed alignment

2018-10-08 Thread Paul Koning



> On Oct 8, 2018, at 1:29 PM, Andrew Haley  wrote:
> 
> On 10/08/2018 06:20 PM, Michael Matz wrote:
>> Only if you somewhere visibly add accesses to *i and *j.  Without them you 
>> only have the "accesses" via memcpy, and as Richi says, those don't imply 
>> any alignment requirements.  The i and j pointers might validly be char* 
>> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's 
>> nothing in your small example program from which GCC can infer that those 
>> two global pointers are in fact 2-aligned.
> 
> So all you'd actually have to say is
> 
> void f1(void)
> {
>*i; *j;
>__builtin_memcpy (i, j, 32);
> }

No, that doesn't help.  Not even if I make it:

void f1(void)
{
k = *i + *j;
__builtin_memcpy (i, j, 4);
}

The first line does word aligned references to *i and *j, but the memcpy 
stubbornly remains a byte move.

paul



[PATCH, pdp11] libgcc: remove -mfloat32

2018-10-08 Thread Paul Koning
I missed a file that needed to be updated for the removal of -mfloat32.

Committed.

paul

ChangeLog:

2018-10-08  Paul Koning  

* config/pdp11/t-pdp11: Remove -mfloat32 switch.

Index: config/pdp11/t-pdp11
===
--- config/pdp11/t-pdp11(revision 264938)
+++ config/pdp11/t-pdp11(working copy)
@@ -5,4 +5,4 @@ LIB2ADD = $(srcdir)/udivmod.c \
  $(srcdir)/memmove.c \
  $(srcdir)/memset.c
 
-HOST_LIBGCC2_CFLAGS += -O2 -mfloat32
+HOST_LIBGCC2_CFLAGS += -O2



Re: movmem pattern and missed alignment

2018-10-08 Thread Paul Koning



> On Oct 8, 2018, at 11:09 AM, Richard Biener  
> wrote:
> 
> On Mon, Oct 8, 2018 at 3:57 PM Paul Koning  wrote:
>> 
>> I have a movmem pattern in my target that pays attention to the alignment 
>> argument.
>> 
>> GCC isn't passing in the expected alignment part of the time.  I have this 
>> test case:
>> 
>> extern int *i, *j;
>> extern int iv[40], jv[40];
>> 
>> void f1(void)
>> {
>>__builtin_memcpy (i, j, 32);
>> }
>> 
>> void f2(void)
>> {
>>__builtin_memcpy (iv, jv, 32);
>> }
>> 
>> When the movmem pattern is called for f1, alignment is 1.  In f2, it is 2 
>> (int is 2 bytes in pdp11) as expected.
>> 
>> The compiler clearly knows that int* points to aligned data, since it 
>> generates instructions that assume alignment (this is a strict-alignment 
>> target) when I dereference the pointer.  But somehow it gets it wrong for 
>> block move.
>> 
>> I also see this for the individual move operations that are generated for 
>> very short memcpy operations; if the count is 4, I get four move byte 
>> operations for f1, but two move word operations for f2.
>> 
>> This seems like a bug.  Am I missing something?
> 
> Yes, memcpy doesn't require anything bigger than byte alignment and
> GCC infers alignemnt
> only from actual memory references or from declarations (like iv /
> jv).  For i and j there
> are no dereferences and thus you get alignment of 1.
> 
> Richard.

Ok, but why is that not a bug?  The whole point of passing alignment to the 
movmem pattern is to let it generate code that takes advantage of the 
alignment.  So we get a missed optimization.

paul



[PATCH, pdp11] Fix LRA failure

2018-10-08 Thread Paul Koning
This patch fixes a failure handling block moves when the LRA register allocator 
is used. 

Committed.

paul

ChangeLog:

2018-10-08  Paul Koning  

* config/pdp11/pdp11-protos.h (output_block_move): Remove.
(expand_block_move): New function.
* config/pdp11/pdp11.c (output_block_move): Remove.
(expand_block_move): New function.
* config/pdp11/pdp11.h (MOVE_RATIO): New definition.
* config/pdp11/pdp11.md (movmemhi): Use expand_block_move.
(*movmemhi1): Remove.

Index: config/pdp11/pdp11-protos.h
===
--- config/pdp11/pdp11-protos.h (revision 264929)
+++ config/pdp11/pdp11-protos.h (revision 264930)
@@ -26,7 +26,7 @@ extern int legitimate_const_double_p (rtx);
 extern void notice_update_cc_on_set (rtx, rtx);
 extern void output_addr_const_pdp11 (FILE *, rtx);
 extern const char *output_move_multiple (rtx *);
-extern const char *output_block_move (rtx *);
+extern void expand_block_move (rtx *);
 extern const char *output_jump (rtx *, int, int);
 extern void print_operand_address (FILE *, rtx);
 typedef enum { no_action, dec_before, inc_after } pdp11_action;
Index: config/pdp11/pdp11.md
===
--- config/pdp11/pdp11.md   (revision 264929)
+++ config/pdp11/pdp11.md   (revision 264930)
@@ -570,48 +570,20 @@
   clrf\t%0"
   [(set_attr "length" "2,2,4,4,2")])
 
-;; maybe fiddle a bit with move_ratio, then 
-;; let constraints only accept a register ...
-
+;; Expand a block move.  We turn this into a move loop.
 (define_expand "movmemhi"
-  [(parallel [(set (match_operand:BLK 0 "general_operand" "=g,g")
-  (match_operand:BLK 1 "general_operand" "g,g"))
- (use (match_operand:HI 2 "general_operand" "n,mr"))
- (use (match_operand:HI 3 "immediate_operand" "i,i"))
- (clobber (match_scratch:HI 6 "=,X"))
- (clobber (match_dup 4))
- (clobber (match_dup 5))
- (clobber (match_dup 2))])]
+  [(match_operand:BLK 0 "general_operand" "=g")
+   (match_operand:BLK 1 "general_operand" "g")
+   (match_operand:HI 2 "immediate_operand" "i")
+   (match_operand:HI 3 "immediate_operand" "i")]
   ""
   "
 {
-  operands[0]
-= replace_equiv_address (operands[0],
-copy_to_mode_reg (Pmode, XEXP (operands[0], 0)));
-  operands[1]
-= replace_equiv_address (operands[1],
-copy_to_mode_reg (Pmode, XEXP (operands[1], 0)));
-
-  operands[4] = XEXP (operands[0], 0);
-  operands[5] = XEXP (operands[1], 0);
+  if (INTVAL (operands[2]) != 0)
+expand_block_move (operands);
+  DONE;
 }")
 
-
-(define_insn "*movmemhi1"
-  [(set (mem:BLK (match_operand:HI 0 "register_operand" "r,r"))
-   (mem:BLK (match_operand:HI 1 "register_operand" "r,r")))
-   (use (match_operand:HI 2 "general_operand" "n,r"))
-   (use (match_operand:HI 3 "immediate_operand" "i,i"))
-   (clobber (match_scratch:HI 4 "=,X"))
-   (clobber (match_dup 0))
-   (clobber (match_dup 1))
-   (clobber (match_dup 2))]
-  ""
-  "* return output_block_move (operands);"
-;;; just a guess
-  [(set_attr "length" "80")])
-   
-
 

 ;;- truncation instructions
 
Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 264929)
+++ config/pdp11/pdp11.c(revision 264930)
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "builtins.h"
 #include "dbxout.h"
+#include "explow.h"
 #include "expmed.h"
 
 /* This file should be included last.  */
@@ -1513,173 +1514,48 @@ no_side_effect_operand(rtx op, machine_mode mode A
 
 
 /*
- * output a block move:
+ * expand a block move:
  *
  * operands[0] ... to
  * operands[1]  ... from
  * operands[2]  ... length
  * operands[3]  ... alignment
- * operands[4]  ... scratch register
  */
 
- 
-const char *
-output_block_move(rtx *operands)
+void
+expand_block_move(rtx *operands)
 {
-static int count = 0;
-char buf[200];
-int unroll;
-int lastbyte = 0;
-
-/* Move of zero bytes is a NOP.  */
-if (operands[2] == const0_rtx)
-  return "";
-
-/* Look for moves by small constant byte counts, those we'll
-   expand to straight line code.  */
-if (CONSTANT_P (operands[2]))
-{
-   if (INTVAL (operands[2]) < 16
-   && (!optimize_size || INTVAL (operands[2]) < 5)
-   && INTVA

movmem pattern and missed alignment

2018-10-08 Thread Paul Koning
I have a movmem pattern in my target that pays attention to the alignment 
argument.

GCC isn't passing in the expected alignment part of the time.  I have this test 
case:

extern int *i, *j;
extern int iv[40], jv[40];

void f1(void)
{
__builtin_memcpy (i, j, 32);
}

void f2(void)
{
__builtin_memcpy (iv, jv, 32);
}

When the movmem pattern is called for f1, alignment is 1.  In f2, it is 2 (int 
is 2 bytes in pdp11) as expected.

The compiler clearly knows that int* points to aligned data, since it generates 
instructions that assume alignment (this is a strict-alignment target) when I 
dereference the pointer.  But somehow it gets it wrong for block move.

I also see this for the individual move operations that are generated for very 
short memcpy operations; if the count is 4, I get four move byte operations for 
f1, but two move word operations for f2.  

This seems like a bug.  Am I missing something?

paul



[PATCH, pdp11] remove -mfloat32, -mfloat64

2018-10-05 Thread Paul Koning
This patch removes switches that allow the size of "float" to be either the 
usual 4, or 8 -- which is also the size of "double".  That second choice 
creates problems for Fortran and violates the Fortran standard.  I don't see a 
reason for having the option; it certainly is not a familiar thing to do on 
this machine.

Committed.

paul

ChangeLog:

2018-10-05  Paul Koning  

* config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Always 32.
* config/pdp11/pdp11.opt (mfloat32): Remove.
(mfloat64): Remove.
* doc/invoke.texi (pdp11 -mfloat32): Remove:
(pdp11 -mfloat64): Remove.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 264880)
+++ doc/invoke.texi (revision 264881)
@@ -1007,7 +1007,6 @@ Objective-C and Objective-C++ Dialects}.
 @emph{PDP-11 Options}
 @gccoptlist{-mfpu  -msoft-float  -mac0  -mno-ac0  -m40  -m45  -m10 @gol
 -mint32  -mno-int16 -mint16  -mno-int32 @gol
--mfloat32  -mno-float64 -mfloat64  -mno-float32 @gol
 -msplit -munix-asm  -mdec-asm -mgnu-asm -mlra}
 
 @emph{picoChip Options}
@@ -22722,18 +22721,6 @@ Use 16-bit @code{int}.  This is the default.
 @opindex mno-int16
 Use 32-bit @code{int}.
 
-@item -mfloat64
-@itemx -mno-float32
-@opindex mfloat64
-@opindex mno-float32
-Use 64-bit @code{float}.  This is the default.
-
-@item -mfloat32
-@itemx -mno-float64
-@opindex mfloat32
-@opindex mno-float64
-Use 32-bit @code{float}.
-
 @item -msplit
 @opindex msplit
 Target has split instruction and data space.  Implies -m45.
Index: config/pdp11/pdp11.opt
===
--- config/pdp11/pdp11.opt  (revision 264880)
+++ config/pdp11/pdp11.opt  (revision 264881)
@@ -42,14 +42,6 @@ mgnu-asm
 Target RejectNegative Report Mask(GNU_ASM) Negative(munix-asm)
 Use the GNU assembler syntax.
 
-mfloat32
-Target Report Mask(FLOAT32)
-Use 32 bit float.
-
-mfloat64
-Target Report InverseMask(FLOAT32, FLOAT64)
-Use 64 bit float.
-
 mfpu
 Target RejectNegative Report Mask(FPU)
 Use hardware floating point.
Index: config/pdp11/pdp11.h
===
--- config/pdp11/pdp11.h(revision 264880)
+++ config/pdp11/pdp11.h(revision 264881)
@@ -59,12 +59,14 @@ along with GCC; see the file COPYING3.  If not see
 #define LONG_TYPE_SIZE 32
 #define LONG_LONG_TYPE_SIZE64 
 
-/* if we set FLOAT_TYPE_SIZE to 32, we could have the benefit 
-   of saving core for huge arrays - the definitions are 
-   already in md - but floats can never reside in 
-   an FPU register - we keep the FPU in double float mode 
-   all the time !! */
-#define FLOAT_TYPE_SIZE(TARGET_FLOAT32 ? 32 : 64)
+/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64,
+   but that conflicts with Fortran language rules.  Since there is no
+   obvious reason why we should have that feature -- other targets
+   generally don't have float and double the same size -- I've removed
+   it.  Note that it continues to be true (for now) that arithmetic is
+   always done with 64-bit values, i.e., the FPU is always in "double"
+   mode.  */
+#define FLOAT_TYPE_SIZE32
 #define DOUBLE_TYPE_SIZE   64
 #define LONG_DOUBLE_TYPE_SIZE  64
 
@@ -200,12 +202,11 @@ extern const struct real_format pdp11_d_format;
 
 MUL_REGS are used for odd numbered regs, to use in 16-bit multiplication
  (even numbered do 32-bit multiply)
-LMUL_REGS long multiply registers (even numbered regs )
- (don't need them, all 32-bit regs are even numbered!)
 GENERAL_REGS is all cpu
 LOAD_FPU_REGS is the first four cpu regs, they are easier to load
 NO_LOAD_FPU_REGS is ac4 and ac5, currently - difficult to load them
 FPU_REGS is all fpu regs 
+CC_REGS is the condition codes (CPU and FPU)
 */
 
 enum reg_class



Re: blkmov and alignment

2018-10-05 Thread Paul Koning



> On Oct 5, 2018, at 12:21 PM, Richard Biener  
> wrote:
> 
> On October 5, 2018 4:17:53 PM GMT+02:00, Paul Koning  
> wrote:
>> The documentation says that argument 4 of the blkmov insn gives the
>> alignment, for example 4 if things are word-aligned.
>> 
>> It's documented that, say, the value 4 means source and destination are
>> multiples of 4.  What isn't clear is whether the length is also a
>> multiple of 4 in this case.  In other words, does "aligned" state the
>> alignment of source and destination and length, or only of the
>> addresses and the length may still be odd?
> 
> It only applies to the address alignment but you may want to check out the 
> expander. 

You mean emit_block_move_via_loop in expr.c?  That has a comment saying it 
would be nice to move units larger than QImode.  Yes, and that's just what I 
was looking to do.

I suppose I could write an expander that produces a different insn when the 
size is also the correct multiple, including the conversion from byte count to 
word count.  That seems like an obvious common optimization so having to do it 
in target code is a bit odd.

paul



blkmov and alignment

2018-10-05 Thread Paul Koning
The documentation says that argument 4 of the blkmov insn gives the alignment, 
for example 4 if things are word-aligned.

It's documented that, say, the value 4 means source and destination are 
multiples of 4.  What isn't clear is whether the length is also a multiple of 4 
in this case.  In other words, does "aligned" state the alignment of source and 
destination and length, or only of the addresses and the length may still be 
odd?

paul



[PATCH, pdp11] Enable LRA for pdp11

2018-10-03 Thread Paul Koning
This patch enables LRA register allocator support for pdp11.  For the moment, 
it is invoked via a switch (-mlra) as is done by a few other targets.  There 
are some code quality issues and test suite rejects to be looked at, but it's 
close enough for initial delivery.

Thanks to Segher for pointing out that LRA requires define_memory_constraint, 
while the old allocator is happy when memory operands use define_constraint. 

Committed.

paul

ChangeLog:

2018-10-03  Paul Koning  

Enable LRA register allocator for PDP11.
* config/pdp11/constraints.md (Q): Use define_memory_constraints.
(R): Likewise.
(D): Likewise.
* config/pdp11/pdp11.c (pdp11_lra_p): New function.
* config/pdp11/pdp11.opt (-mlra): New option.
* doc/invoke.texi (PDP-11 Options): Document -mlra.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 264818)
+++ doc/invoke.texi (revision 264819)
@@ -1007,7 +1007,7 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-mfpu  -msoft-float  -mac0  -mno-ac0  -m40  -m45  -m10 @gol
 -mint32  -mno-int16 -mint16  -mno-int32 @gol
 -mfloat32  -mno-float64 -mfloat64  -mno-float32 @gol
--msplit -munix-asm  -mdec-asm -mgnu-asm}
+-msplit -munix-asm  -mdec-asm -mgnu-asm -mlra}
 
 @emph{picoChip Options}
 @gccoptlist{-mae=@var{ae_type}  -mvliw-lookahead=@var{N} @gol
@@ -22721,6 +22721,11 @@ Use DEC assembler syntax.
 @item -mgnu-asm
 @opindex mgnu-asm
 Use GNU assembler syntax.  This is the default.
+
+@item -mlra
+@opindex mlra
+Use the new LRA register allocator.  By default, the old ``reload''
+allocator is used.
 @end table
 
 @node picoChip Options
Index: config/pdp11/constraints.md
===
--- config/pdp11/constraints.md (revision 264818)
+++ config/pdp11/constraints.md (revision 264819)
@@ -70,19 +70,19 @@
   (and (match_code "const_double")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
-(define_constraint "Q"
+(define_memory_constraint "Q"
   "Memory reference that requires an additional word after the opcode"
   (and (match_code "mem")
(match_test "memory_address_p (GET_MODE (op), XEXP (op, 0))
 && !simple_memory_operand (op, GET_MODE (op))")))
 
-(define_constraint "R"
+(define_memory_constraint "R"
   "Memory reference that is encoded within the opcode"
   (and (match_code "mem")
(match_test "memory_address_p (GET_MODE (op), XEXP (op, 0))
 && simple_memory_operand (op, GET_MODE (op))")))
 
-(define_constraint "D"
+(define_memory_constraint "D"
   "Memory reference that is encoded within the opcode, and not push or pop"
   (and (match_code "mem")
(match_test "memory_address_p (GET_MODE (op), XEXP (op, 0))
Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 264818)
+++ config/pdp11/pdp11.c(revision 264819)
@@ -230,7 +230,7 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS 
pdp11_preferred_output_reload_class
 
 #undef  TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P pdp11_lra_p
 
 #undef  TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P pdp11_legitimate_address_p
@@ -991,6 +991,12 @@ pdp11_assemble_integer (rtx x, unsigned int size,
 }
 
 
+static bool
+pdp11_lra_p (void)
+{
+  return TARGET_LRA;
+}
+
 /* Register to register moves are cheap if both are general
registers.  */
 static int 
Index: config/pdp11/pdp11.opt
===
--- config/pdp11/pdp11.opt  (revision 264818)
+++ config/pdp11/pdp11.opt  (revision 264819)
@@ -73,3 +73,7 @@ Target has split I
 munix-asm
 Target RejectNegative Report Mask(UNIX_ASM) Negative(mdec-asm)
 Use UNIX assembler syntax.
+
+mlra
+Target Report Mask(LRA)
+Use LRA register allocator



Re: [PATCH] middle-end/81035: Mention that noreturn suppresses tail call optimization

2018-09-21 Thread Paul Koning



> On Sep 21, 2018, at 2:17 PM, Florian Weimer  wrote:
> 
> * Segher Boessenkool:
> 
>> On Fri, Sep 21, 2018 at 12:59:27PM +0200, Florian Weimer wrote:
>>> 2018-09-21  Florian Weimer  
>>> 
>>> PR middle-end/81035
>>> * doc/extend.texi (Common Function Attributes): Mention that
>>> noreturn suppresses tail call optimization.
>> 
>>> +In order to preserve backtraces, GCC will never turn calls to
>>> +@code{noreturn} functions into tail calls.
>> 
>> Should we document this?  Shouldn't we fix it, instead?
> 
> Fix how?  What is currently broken?
> 
> For things like assertion failures, we do not want to replace the
> current stack frame with that of __assert_fail, I think.

I agree.  Also, tailcalls are optimizations.  Speed optimizing noreturn calls 
is obviously not interesting.  Calls to noreturn functions are short, and 
turning them into a jump probably makes no difference in size, or if it does, 
not enough to matter.

paul



Re: [PATCH 04/25] SPECIAL_REGNO_P

2018-09-13 Thread Paul Koning



> On Sep 13, 2018, at 10:58 AM, Andrew Stubbs  wrote:
> 
> On 13/09/18 15:49, Paul Koning wrote:
>> It's ambiguous, because the last sentence of that paragraph says "addm3 is 
>> used if addptrm3 is not defined."
> 
> I didn't read that as ambiguous; I read it as addm3 is assumed to work fine 
> when addptr is not defined.
> 
>> I don't know of any change in this area.  All I know is that pdp11 has adds 
>> that clobber CC and it doesn't define addptrm3, relying on that last 
>> sentence.  I've tried LRA and for the most part it compiles successfully, I 
>> suppose I should verify the generated code based on the point you raised.  
>> If I really have to define addptr, I'm in trouble because  save/restore CC 
>> is not easy on pdp11.
> 
> The code was added because we had a number of testcases that failed at 
> runtime without it.
> 
> Admittedly, that was in a GCC 7 code-base, and I can't reproduce the failure 
> with one of those test cases now (with addptr deleted), but possibly that's 
> just noise.

Possibly relevant is that pdp11 is a "type 2" CC setting target, one where the 
machine description doesn't mention CC until after reload.  So if reload (LRA) 
is generating adds, the CC effect of that is invisible anyway until later 
passes that deal with the resulting clobbers and elimination, or not, of 
compares.

If that's what this is all about, some documentation clarification would help.  
Can someone confirm (or refute) my guess?

paul




Re: [PATCH 04/25] SPECIAL_REGNO_P

2018-09-13 Thread Paul Koning



> On Sep 13, 2018, at 10:39 AM, Andrew Stubbs  wrote:
> 
> On 13/09/18 15:16, Paul Koning wrote:
>> If you don't have machine operations that add without messing with
>> condition codes, wouldn't it make sense to omit the definition of the
>> add-pointer patterns?  GCC will build things out of normal
>> (CC-clobbering) adds if there are no add-pointer operations, which
>> may well be more efficient in most cases than explicitly
>> saving/restoring a CC that may in fact not matter right at that
>> spot.
> 
> I thought the whole point of addptr is that it *is* needed when add
> clobbers CC? As in, LRA spills are malformed without this.
> 
> Did something change? The internals manual still says "It only needs to
> be defined if addm3 sets the condition code."

It's ambiguous, because the last sentence of that paragraph says "addm3 is used 
if addptrm3 is not defined."  

I don't know of any change in this area.  All I know is that pdp11 has adds 
that clobber CC and it doesn't define addptrm3, relying on that last sentence.  
I've tried LRA and for the most part it compiles successfully, I suppose I 
should verify the generated code based on the point you raised.  If I really 
have to define addptr, I'm in trouble because  save/restore CC is not easy on 
pdp11.

paul



Re: [PATCH 04/25] SPECIAL_REGNO_P

2018-09-13 Thread Paul Koning



> On Sep 13, 2018, at 10:08 AM, Andrew Stubbs  wrote:
> 
> On 13/09/18 11:01, Andrew Stubbs wrote:
>> The assert is caused because the def-use chains indicate that SCC conflicts 
>> with itself. I suppose the question is why is it doing that, but it's 
>> probably do do with that being a special register that gets used in split2 
>> (particularly by the addptrdi3 pattern). Although, those patterns are 
>> careful to save SCC to one side and then restore it again after, so I'd have 
>> thought the DF analysis would work out?
> 
> I think I may have a theory on this one now
> 
> The addptrdi3 pattern must use two 32-bit adds with a carry in SCC, but 
> addptr patterns are not allowed to clobber SCC, so the splitter carefully 
> saves and restores the old value.

If you don't have machine operations that add without messing with condition 
codes, wouldn't it make sense to omit the definition of the add-pointer 
patterns?  GCC will build things out of normal (CC-clobbering) adds if there 
are no add-pointer operations, which may well be more efficient in most cases 
than explicitly saving/restoring a CC that may in fact not matter right at that 
spot.

paul



Trampolines and descriptors

2018-09-06 Thread Paul Koning
This came up in a discussion of an or1k patch, but it left me wondering so I'll 
raise the question again.

Are function descriptors meaningful outside Ada?  The internals manual seems to 
say yes, if enabled they will be used -- instead of trampolines -- also for C 
nested functions.

Is that correct?  It seems that this is worth using for any machine where it's 
desirable to avoid executing stack data.

paul

> On Aug 31, 2018, at 9:19 AM, Paul Koning  wrote:
> 
> 
> 
>> On Aug 30, 2018, at 9:02 PM, Jeff Law  wrote:
>> 
>> On 08/30/2018 10:58 AM, Richard Henderson wrote:
>>> On 08/28/2018 07:13 AM, Jeff Law wrote:
>>>> Please consider using function descriptors rather than trampolines.
>>>> This allows you to make the stack non-executable at all times which is
>>>> good from a security standpoint.  The downside is the indirect calling
>>>> mechanism has to change slightly to distinguish between a simple
>>>> indirect call and one through a function descriptor (usually by having a
>>>> low bit on to indicate the latter).  GIven this is an ABI change, now is
>>>> probably the last opportunity to make this change.
>>> 
>>> Correct me if I'm wrong here:
>>> 
>>> Define TARGET_CUSTOM_FUNCTION_DESCRIPTORS to an appropriate value -- easy 
>>> for a
>>> RISC target -- and that's it.
>>> 
>>> Further, it pretty much only gets used by the Ada front end.  One should not
>>> expect these to be used by the C front end nested functions.
>> I thought it was used more extensively than that...  Thanks for checking
>> into it though.
>> 
>> Jeff
> 
> My impression from reading the internals manual is that it's an alternative 
> to trampolines -- and in fact it appears to suggest it's a superior 
> alternative.  I've been planning to try turning it on for pdp11 where 
> executable stacks can be problematic.  (For that matter, they are on lots of 
> other machines -- which is why descriptors instead of trampolines sounds like 
> a good thing.)
> 
>   paul
> 



Re: gcc books, gcc debug (errata)

2018-09-04 Thread Paul Koning



> On Sep 4, 2018, at 12:40 PM, gérard Calliet  
> wrote:
> 
> (our build version is 4.7.3)
> 
> Hello,
> 
> I'm just a beginner in gcc. (I contributed to the rebuild of the gnat ada 
> compiler (version 3.4.7) on OpenVMS last year: 
> https://github.com/AdaLabs/gnat-vms).
> 
> As I do love books and encyclopedic learning attitude, I searched for a good 
> book about gcc.

My favorite resource is the GCC Internals manual, which is part of the GCC 
distribution.  Unlike the internals manuals from some other GNU projects, it's 
quite thorough.  Not everything is covered deep enough, but it certainly 
delivers a solid basis on many aspects of the system.

paul




Re: Even numbered register pairs restriction on some instructions

2018-09-03 Thread Paul Koning



> On Sep 3, 2018, at 1:25 PM, Matthew Malcomson  
> wrote:
> 
>>> 
>>> Thanks for the suggestions,
>>> I've had a look into these, and unfortunately it seems they have the same 
>>> problem I've been hitting before.
>>> 
>>> The use of the TARGET_HARD_REGNO_MODE_OK macro limits all uses of registers 
>>> in a given mode (so that we wouldn't be able to use register pairs 
>>> beginning with odd registers for other instructions), while using an 
>>> EVEN_REGS register class won't work in modes that span more than one hard 
>>> register (as all hard registers covered by a pseudo register must be in the 
>>> same class).
>> Is that so?  I don't remember that restriction, and certainly pdp11 does not 
>> do this.  Could that be why LRA is failing in some of my tests?  The "old" 
>> reload pass makes no objections.
>> 
>> Note though that there are two possible ways to look at this.
>> 
>> 1. Is there a register class such that all of the hard registers for a given 
>> mode are in that class?
>> 
>> 2. Is the value returned by REGNO_REG_CLASS the same for each of the hard 
>> registers of a register pair?
>> 
>> For pdp11, (1) is true (they are both members of GENERAL_REGS) while (2) is 
>> false (for the even register, GENERAL_REGS is returned, vs. MUL_REGS for the 
>> odd one).
>> 
>>  paul
>> 
> Apologies, I was unclear there.
> 
> When I said "all hard registers covered by a pseudo register must be in the 
> same class" I meant "in order for a register constraint to that class to be 
> satisfied".
> 
> i.e. if you were to use the "d" constraint in the SI operand of the 
> "mulhisi3" pattern, then that constraint would not allow any register pair to 
> be used since all register pairs use an odd register.
> 
> That is why "register classes cannot be used to enforce a requirement for a 
> register pair to start with an even-numbered register" as mentioned in 
> https://gcc.gnu.org/onlinedocs/gccint/Register-Classes.html 
>  .
> 
> Matthew

I see what you mean.  You want, say, SImode to be sometimes in an even/odd 
pair, and sometimes in any pair.  I'd like the same thing, in fact, or more 
generally still, I'd like SImode sometimes just to be a pair of HImode values 
that don't have to be both in registers at all, never mind in adjacent 
registers.  There doesn't seem to be a way to do that.  So pdp11, for example, 
always has SImode in even/odd pairs even though only MUL and DIV require that, 
no one else.

paul



Re: Even numbered register pairs restriction on some instructions

2018-09-03 Thread Paul Koning



> On Sep 3, 2018, at 12:10 PM, Matthew Malcomson  
> wrote:
> 
>> 
>> I think you can use pdp11 as an example, it does two things that are similar 
>> to what you're describing.
>> 
>> One is that it requires SImode to go into an even regno, and indicates that 
>> it uses two registers.  See TARGET_HARD_REGNO_MODE_OK and 
>> TARGET_HARD_REGNO_NREGS.
>> 
>> The other is that it has one instruction that wants an odd (!) register: 16 
>> bit multiply.  Multiply is weird: if you give it an even destination 
>> register it produces a 32 bit result in that register pair, with an odd 
>> register number it produces a 16 bit result.  So pdp11.md defines both 
>> "mulhi3" and "mulsihi3" insns.  The latter has an SImode output so that uses 
>> the even numbered register, as I already described.  The former uses a 
>> machine-specific constraint "d". That is defined in constraints.md to mean a 
>> register in class MUL_REGS.  pdp11.h defines that name and what registers it 
>> refers to, and pdp11.h does the reverse mapping (REGNO_REG_CLASS).
>> 
>> If you want even register numbers for some instructions but that's not tied 
>> to a specific type size (like SImode in my case), I think you'd want to use 
>> something analogous to the MUL_REGS thing I described.  Say, "EVEN_REGS".  
>> REGNO_REG_CLASS would report even regnum to be in EVEN_REGS, odd in 
>> GENERAL_REGS. The bitmaps for REG_CLASS_CONTENTS would show that EVEN_REGS 
>> contains only even numbered registers while GENERAL_REGS contains both odd 
>> and even.  And you'd defined a register constraint which matches EVEN_REGS.  
>> Then the instructions where you want them would use that constraint.
>> 
>>  paul
>> 
> 
>> Yes, it's possible.  You can look at TDmode (128-bit decimal floating point)
>> on powerpc64*-linux, which is only allowed in even-odd register pairs.
>> It's in *all* cases though, not some of the time.
>> 
>> Peter
>> 
> 
> Thanks for the suggestions,
> I've had a look into these, and unfortunately it seems they have the same 
> problem I've been hitting before.
> 
> The use of the TARGET_HARD_REGNO_MODE_OK macro limits all uses of registers 
> in a given mode (so that we wouldn't be able to use register pairs beginning 
> with odd registers for other instructions), while using an EVEN_REGS register 
> class won't work in modes that span more than one hard register (as all hard 
> registers covered by a pseudo register must be in the same class).

Is that so?  I don't remember that restriction, and certainly pdp11 does not do 
this.  Could that be why LRA is failing in some of my tests?  The "old" reload 
pass makes no objections.

Note though that there are two possible ways to look at this.

1. Is there a register class such that all of the hard registers for a given 
mode are in that class?

2. Is the value returned by REGNO_REG_CLASS the same for each of the hard 
registers of a register pair?

For pdp11, (1) is true (they are both members of GENERAL_REGS) while (2) is 
false (for the even register, GENERAL_REGS is returned, vs. MUL_REGS for the 
odd one).

paul



Re: Even numbered register pairs restriction on some instructions

2018-08-31 Thread Paul Koning



> On Aug 31, 2018, at 11:41 AM, Matthew Malcomson  
> wrote:
> 
> Hi there,
> 
> I'm looking into whether it's possible to require even numbered registers on
> modes that need more than one hard-register to represent them. But only in
> some cases.
> 
> The problem is the one mentioned explicitly here
> https://gcc.gnu.org/onlinedocs/gccint/Register-Classes.html about enforcing a
> requirement for a register pair to start with an even-numbered register.
> I can't use TARGET_HARD_REGNO_MODE_OK as suggested in that document because 
> I'd
> like to allow register pairs starting with odd-numbered registers in general,
> but not in some specific cases.
> 
> From a comment in gcc/config/sparc/constraints.md above the constraint "U" it
> looks like there isn't a way to define such a constraint when using LRA, and I
> haven't found any hook that could do so but I'm hoping there's something I'm
> missing.

I think you can use pdp11 as an example, it does two things that are similar to 
what you're describing.

One is that it requires SImode to go into an even regno, and indicates that it 
uses two registers.  See TARGET_HARD_REGNO_MODE_OK and TARGET_HARD_REGNO_NREGS.

The other is that it has one instruction that wants an odd (!) register: 16 bit 
multiply.  Multiply is weird: if you give it an even destination register it 
produces a 32 bit result in that register pair, with an odd register number it 
produces a 16 bit result.  So pdp11.md defines both "mulhi3" and "mulsihi3" 
insns.  The latter has an SImode output so that uses the even numbered 
register, as I already described.  The former uses a machine-specific 
constraint "d".  That is defined in constraints.md to mean a register in class 
MUL_REGS.  pdp11.h defines that name and what registers it refers to, and 
pdp11.h does the reverse mapping (REGNO_REG_CLASS).

If you want even register numbers for some instructions but that's not tied to 
a specific type size (like SImode in my case), I think you'd want to use 
something analogous to the MUL_REGS thing I described.  Say, "EVEN_REGS".  
REGNO_REG_CLASS would report even regnum to be in EVEN_REGS, odd in 
GENERAL_REGS.  The bitmaps for REG_CLASS_CONTENTS would show that EVEN_REGS 
contains only even numbered registers while GENERAL_REGS contains both odd and 
even.  And you'd defined a register constraint which matches EVEN_REGS.  Then 
the instructions where you want them would use that constraint.

paul



Re: [PATCH 3/3] or1k: gcc: initial support for openrisc

2018-08-31 Thread Paul Koning



> On Aug 30, 2018, at 9:02 PM, Jeff Law  wrote:
> 
> On 08/30/2018 10:58 AM, Richard Henderson wrote:
>> On 08/28/2018 07:13 AM, Jeff Law wrote:
>>> Please consider using function descriptors rather than trampolines.
>>> This allows you to make the stack non-executable at all times which is
>>> good from a security standpoint.  The downside is the indirect calling
>>> mechanism has to change slightly to distinguish between a simple
>>> indirect call and one through a function descriptor (usually by having a
>>> low bit on to indicate the latter).  GIven this is an ABI change, now is
>>> probably the last opportunity to make this change.
>> 
>> Correct me if I'm wrong here:
>> 
>> Define TARGET_CUSTOM_FUNCTION_DESCRIPTORS to an appropriate value -- easy 
>> for a
>> RISC target -- and that's it.
>> 
>> Further, it pretty much only gets used by the Ada front end.  One should not
>> expect these to be used by the C front end nested functions.
> I thought it was used more extensively than that...  Thanks for checking
> into it though.
> 
> Jeff

My impression from reading the internals manual is that it's an alternative to 
trampolines -- and in fact it appears to suggest it's a superior alternative.  
I've been planning to try turning it on for pdp11 where executable stacks can 
be problematic.  (For that matter, they are on lots of other machines -- which 
is why descriptors instead of trampolines sounds like a good thing.)

paul



Re: Trying to convert to LRA, running into an ICE (infinite reload loop)

2018-08-20 Thread Paul Koning



> On Aug 20, 2018, at 7:17 AM, Segher Boessenkool  
> wrote:
> 
> On Tue, Aug 14, 2018 at 03:32:01PM -0400, Paul Koning wrote:
>> I'm trying to convert the pdp11 target to use LRA.
>> 
>> A lot of it "just works".  But one of the test suite files fails in a way 
>> that I can't figure out at all.  I'm hoping for some help or hints to track 
>> down the cause and come up with a fix.
>> 
>> The failing program is testsuite/gcc.c-torture/compile/ifreg.c:
> 
> [ snip ]
> 
>> I don't know how to read that, and I don't know what to look for to find a 
>> cause or a cure.  Apart from turning on LRA, the code I'm running is what's 
>> in the repository, if anyone wants to look at the md file or other bits to 
>> offer suggestions.
> 
> I think this is because the PDP port still has subregs of memory.  You
> can get rid of that by defining some (trivial) scheduling.

I did so, and insn-attr-common.h now says:
  #define INSN_SCHEDULING

But the bug is still there, unchanged.  

The input insn that sends LRA into an infinite loop isn't a memory subreg; the 
ira dump file shows it this way:

(insn 18 22 19 2 (set (subreg:HI (reg/v:DI 26 [ b ]) 0)
(const_int 0 [0])) 
"../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
 (nil))

So a subreg set to zero.  And in the reload dump it shows this way:

(insn 18 22 25 2 (set (reg:HI 32)
(const_int 0 [0])) 
"../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
 (nil))

which makes sense given that the register width (word width) is 2. But that's 
the insn that is then followed by an infinite number of register copy 
operations.

paul



Re: Trying to convert to LRA, running into an ICE (infinite reload loop)

2018-08-20 Thread Paul Koning



> On Aug 20, 2018, at 7:17 AM, Segher Boessenkool  
> wrote:
> 
> On Tue, Aug 14, 2018 at 03:32:01PM -0400, Paul Koning wrote:
>> I'm trying to convert the pdp11 target to use LRA.
>> 
>> A lot of it "just works".  But one of the test suite files fails in a way 
>> that I can't figure out at all.  I'm hoping for some help or hints to track 
>> down the cause and come up with a fix.
>> 
>> The failing program is testsuite/gcc.c-torture/compile/ifreg.c:
> 
> [ snip ]
> 
>> I don't know how to read that, and I don't know what to look for to find a 
>> cause or a cure.  Apart from turning on LRA, the code I'm running is what's 
>> in the repository, if anyone wants to look at the md file or other bits to 
>> offer suggestions.
> 
> I think this is because the PDP port still has subregs of memory.  You
> can get rid of that by defining some (trivial) scheduling.

I don't know what the port does that enables memory subregs.  If I read gccint 
correctly, just turning on scheduling should make them go away?  Ok, adding 
scheduling was on my list anyway; pdp11s with float are dual pipe machines so 
it's actually a meaningful thing to do.

The internals manual seems to say that memory subregs are an old mechanism that 
should still work, give or take.  If indeed it breaks LRA perhaps the 
documentation should be updated to say that it is an obsolete mechanism that no 
longer works and needs to be disabled.

paul



Re: Trying to convert to LRA, running into an ICE (infinite reload loop)

2018-08-15 Thread Paul Koning



> On Aug 15, 2018, at 1:01 AM, Jeff Law  wrote:
> 
> On 08/14/2018 01:32 PM, Paul Koning wrote:
>> I'm trying to convert the pdp11 target to use LRA.
>> 
>> A lot of it "just works".  But one of the test suite files fails in a way 
>> that I can't figure out at all.  I'm hoping for some help or hints to track 
>> down the cause and come up with a fix.
>> 
>> The failing program is testsuite/gcc.c-torture/compile/ifreg.c:
>> 
>> union foo
>> {
>>  float f;
>>  int i;
>> };
>> 
>> foo (int a, float c)
>> {
>>  union foo b;
>>  b.i = a;
>>  return b.f + c;
>> }
>> 
>> When compiled in LRA mode, I get this:
>> 
>> during RTL pass: reload
>> dump file: ifreg.c.274r.reload
>> ../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c: In function ‘foo’:
>> ../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c:12:1: internal 
>> compiler error: Max. number of generated reload insns per insn is achieved 
>> (90)
>> 
>> I fiddled a bit with the mov patterns, for example disabling the 
>> patterns that work on multi-word values (SI and DI mode, since pdp11 word is 
>> HImode).  That doesn't fix (or create) the failure, though the specific 
>> insns change as a result.
>> 
>> What I see going in to reload (from the "ira" dump file) is:
>> 
>> (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (note 2 5 4 2 NOTE_INSN_DELETED)
>> (note 4 2 22 2 NOTE_INSN_FUNCTION_BEG)
>> (insn 22 4 18 2 (clobber (reg/v:DI 26 [ b ])) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 -1
>> (nil))
>> (insn 18 22 19 2 (set (subreg:HI (reg/v:DI 26 [ b ]) 0)
>>(const_int 0 [0])) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
>> (nil))
>> (insn 19 18 20 2 (set (subreg:HI (reg/v:DI 26 [ b ]) 2)
>>(const_int 0 [0])) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
>> (nil))
>> ...
>> 
>> Reload shows this:
>> 
>> (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (note 2 5 4 2 NOTE_INSN_DELETED)
>> (note 4 2 22 2 NOTE_INSN_FUNCTION_BEG)
>> (insn 22 4 18 2 (clobber (reg/v:DI 26 [ b ])) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 -1
>> (nil))
>> (insn 18 22 25 2 (set (reg:HI 32)
>>(const_int 0 [0])) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
>> (nil))
>> (insn 25 18 26 2 (set (reg:HI 33)
>>(reg:HI 32)) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
>> (nil))
>> (insn 26 25 27 2 (set (reg:HI 34)
>>(reg:HI 33)) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
>> (nil))
>> (insn 27 26 28 2 (set (reg:HI 35)
>>(reg:HI 34)) 
>> "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
>> (nil))
>> ... with insns that do nothing, like the one above, repeating until it gives 
>> up.
>> 
>> At the top of the dump file where LRA is reporting what it's doing, I see 
>> this:
>> 
>> ** Local #1: **
>> 
>> Spilling non-eliminable hard regs: 6
>> New elimination table:
>> Can eliminate 15 to 6 (offset=10, prev_offset=0)
>> Can eliminate 15 to 5 (offset=4, prev_offset=0)
>> Can eliminate 14 to 6 (offset=8, prev_offset=0)
>> Can eliminate 14 to 5 (offset=0, prev_offset=0)
>>0 Non input pseudo reload: reject++
>>  alt=0,overall=7,losers=1,rld_nregs=1
>>0 Non input pseudo reload: reject++
>>  alt=1,overall=7,losers=1,rld_nregs=1
>>alt=2: Bad operand -- refuse
>>alt=3: Bad operand -- refuse
>>   Choosing alt 0 in insn 18:  (0) =rR  (1) rRN {movhi}
> We've selected alternative 0 for insn 18.  I believe that's the rRN ->
> =rR alternative.

Yes, specifically it matches the "N" source constraint (literal zero operand).

>>  Creating newreg=32, assigning class GENERAL_REGS to r32
> The insn did not match its constraints.  So some kind of reload is
> needed.  We allocate a new reload register r32.
> 
>>   18: r32:HI=0
>>Inserting insn reload after:
> And we try to initialize r32 to 0.  Which of course needs another reload.
> 
> I suspect LRA is somehow missing that the operand could live in memory.

Not exactly memory, the source operand is constant 0 and the resulting machine 
instruction will be a "clr".  The puzzle is why it doesn't see it as matching 
when (a) it clearly does and (b) LRA knew that because it told us so in that 
first match (it chose alternative zero).

What should I look at next to try to figure out why LRA is getting confused?

paul



Trying to convert to LRA, running into an ICE (infinite reload loop)

2018-08-14 Thread Paul Koning
I'm trying to convert the pdp11 target to use LRA.

A lot of it "just works".  But one of the test suite files fails in a way that 
I can't figure out at all.  I'm hoping for some help or hints to track down the 
cause and come up with a fix.

The failing program is testsuite/gcc.c-torture/compile/ifreg.c:

union foo
{
  float f;
  int i;
};

foo (int a, float c)
{
  union foo b;
  b.i = a;
  return b.f + c;
}

When compiled in LRA mode, I get this:

during RTL pass: reload
dump file: ifreg.c.274r.reload
../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c: In function ‘foo’:
../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c:12:1: internal compiler 
error: Max. number of generated reload insns per insn is achieved (90)

I fiddled a bit with the mov patterns, for example disabling the patterns 
that work on multi-word values (SI and DI mode, since pdp11 word is HImode).  
That doesn't fix (or create) the failure, though the specific insns change as a 
result.

What I see going in to reload (from the "ira" dump file) is:

(note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 5 4 2 NOTE_INSN_DELETED)
(note 4 2 22 2 NOTE_INSN_FUNCTION_BEG)
(insn 22 4 18 2 (clobber (reg/v:DI 26 [ b ])) 
"../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 -1
 (nil))
(insn 18 22 19 2 (set (subreg:HI (reg/v:DI 26 [ b ]) 0)
(const_int 0 [0])) 
"../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
 (nil))
(insn 19 18 20 2 (set (subreg:HI (reg/v:DI 26 [ b ]) 2)
(const_int 0 [0])) 
"../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
 (nil))
...

Reload shows this:

(note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 5 4 2 NOTE_INSN_DELETED)
(note 4 2 22 2 NOTE_INSN_FUNCTION_BEG)
(insn 22 4 18 2 (clobber (reg/v:DI 26 [ b ])) 
"../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 -1
 (nil))
(insn 18 22 25 2 (set (reg:HI 32)
(const_int 0 [0])) 
"../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 21 {movhi}
 (nil))
(insn 25 18 26 2 (set (reg:HI 33)
(reg:HI 32)) "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 
21 {movhi}
 (nil))
(insn 26 25 27 2 (set (reg:HI 34)
(reg:HI 33)) "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 
21 {movhi}
 (nil))
(insn 27 26 28 2 (set (reg:HI 35)
(reg:HI 34)) "../../gcc/gcc/testsuite/gcc.c-torture/compile/ifreg.c":10 
21 {movhi}
 (nil))
... with insns that do nothing, like the one above, repeating until it gives up.

At the top of the dump file where LRA is reporting what it's doing, I see this:

** Local #1: **

   Spilling non-eliminable hard regs: 6
New elimination table:
Can eliminate 15 to 6 (offset=10, prev_offset=0)
Can eliminate 15 to 5 (offset=4, prev_offset=0)
Can eliminate 14 to 6 (offset=8, prev_offset=0)
Can eliminate 14 to 5 (offset=0, prev_offset=0)
0 Non input pseudo reload: reject++
  alt=0,overall=7,losers=1,rld_nregs=1
0 Non input pseudo reload: reject++
  alt=1,overall=7,losers=1,rld_nregs=1
alt=2: Bad operand -- refuse
alt=3: Bad operand -- refuse
 Choosing alt 0 in insn 18:  (0) =rR  (1) rRN {movhi}
  Creating newreg=32, assigning class GENERAL_REGS to r32
   18: r32:HI=0
Inserting insn reload after:
   25: r26:DI#0=r32:HI

0 Non input pseudo reload: reject++
1 Non pseudo reload: reject++
Cycle danger: overall += LRA_MAX_REJECT
  alt=0,overall=608,losers=1,rld_nregs=1
0 Non input pseudo reload: reject++
alt=1: Bad operand -- refuse
alt=2: Bad operand -- refuse
alt=3: Bad operand -- refuse
 Choosing alt 0 in insn 25:  (0) =rR  (1) rRN {movhi}
  Creating newreg=33, assigning class GENERAL_REGS to r33
   25: r33:HI=r32:HI
Inserting insn reload after:
   26: r26:DI#0=r33:HI

0 Non input pseudo reload: reject++
1 Non pseudo reload: reject++
Cycle danger: overall += LRA_MAX_REJECT
  alt=0,overall=608,losers=1,rld_nregs=1
0 Non input pseudo reload: reject++
alt=1: Bad operand -- refuse
alt=2: Bad operand -- refuse
alt=3: Bad operand -- refuse
 Choosing alt 0 in insn 26:  (0) =rR  (1) rRN {movhi}
  Creating newreg=34, assigning class GENERAL_REGS to r34
   26: r34:HI=r33:HI
Inserting insn reload after:
   27: r26:DI#0=r34:HI
...

I don't know how to read that, and I don't know what to look for to find a 
cause or a cure.  Apart from turning on LRA, the code I'm running is what's in 
the repository, if anyone wants to look at the md file or other bits to offer 
suggestions.

paul



Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-08-08 Thread Paul Koning
Thanks.  Ok, so the expressions you gave are undefined for x==1, which says 
that substituting something that is also undefined for x==1 is permitted.  You 
can argue from "undefined" rather than relying on IEEE features like NaN or 
infinite.

paul

> On Aug 8, 2018, at 2:57 PM, Giuliano Augusto Faulin Belinassi 
>  wrote:
> 
> Sorry about that. In the e-mail text field I wrote sinh(tanh(x)) and
> cosh(tanh(x)) where it was supposed to be sinh(atanh(x)) and
> cosh(atanh(x)), thus I am talking about the inverse hyperbolic tangent
> function. The patch code and comments are still correct.
> 
> On Wed, Aug 8, 2018 at 10:58 AM, Paul Koning  wrote:
>> Now I'm puzzled.
>> 
>> I don't see how an infinite would show up in the original expression.  I 
>> don't know hyperbolic functions, so I just constructed a small test program, 
>> and the original vs. the substitution you mention are not at all similar.
>> 
>>paul
>> 
>> 
>>> On Aug 7, 2018, at 4:42 PM, Giuliano Augusto Faulin Belinassi 
>>>  wrote:
>>> 
>>> That is a good question because I didn't know that such targets
>>> exists. Any suggestion?
>>> 
>>> 
>>> On Tue, Aug 7, 2018 at 5:29 PM, Paul Koning  wrote:
>>>> 
>>>> 
>>>>> On Aug 7, 2018, at 4:00 PM, Giuliano Augusto Faulin Belinassi 
>>>>>  wrote:
>>>>> 
>>>>> Related with bug 86829, but for hyperbolic trigonometric functions.
>>>>> This patch adds substitution rules to both sinh(tanh(x)) -> x / sqrt(1
>>>>> - x*x) and cosh(tanh(x)) -> 1 / sqrt(1 - x*x). Notice that the both
>>>>> formulas has division by 0, but it causes no harm because 1/(+0) ->
>>>>> +infinity, thus the math is still safe.
>>>> 
>>>> What about non-IEEE targets that don't have "infinite" in their float 
>>>> representation?
>>>> 
>>>>   paul
>>>> 
>>>> 
>> 



Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-08-08 Thread Paul Koning
Now I'm puzzled.

I don't see how an infinite would show up in the original expression.  I don't 
know hyperbolic functions, so I just constructed a small test program, and the 
original vs. the substitution you mention are not at all similar.

paul


> On Aug 7, 2018, at 4:42 PM, Giuliano Augusto Faulin Belinassi 
>  wrote:
> 
> That is a good question because I didn't know that such targets
> exists. Any suggestion?
> 
> 
> On Tue, Aug 7, 2018 at 5:29 PM, Paul Koning  wrote:
>> 
>> 
>>> On Aug 7, 2018, at 4:00 PM, Giuliano Augusto Faulin Belinassi 
>>>  wrote:
>>> 
>>> Related with bug 86829, but for hyperbolic trigonometric functions.
>>> This patch adds substitution rules to both sinh(tanh(x)) -> x / sqrt(1
>>> - x*x) and cosh(tanh(x)) -> 1 / sqrt(1 - x*x). Notice that the both
>>> formulas has division by 0, but it causes no harm because 1/(+0) ->
>>> +infinity, thus the math is still safe.
>> 
>> What about non-IEEE targets that don't have "infinite" in their float 
>> representation?
>> 
>>paul
>> 
>> 



Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-08-07 Thread Paul Koning



> On Aug 7, 2018, at 4:00 PM, Giuliano Augusto Faulin Belinassi 
>  wrote:
> 
> Related with bug 86829, but for hyperbolic trigonometric functions.
> This patch adds substitution rules to both sinh(tanh(x)) -> x / sqrt(1
> - x*x) and cosh(tanh(x)) -> 1 / sqrt(1 - x*x). Notice that the both
> formulas has division by 0, but it causes no harm because 1/(+0) ->
> +infinity, thus the math is still safe.

What about non-IEEE targets that don't have "infinite" in their float 
representation?

paul




Re: [PATCH] Use getentropy() for seeding PRNG

2018-08-03 Thread Paul Koning



> On Aug 3, 2018, at 9:19 AM, Janne Blomqvist  wrote:
> 
> The getentropy function, found on Linux, OpenBSD, and recently also
> FreeBSD, can be used to get random bytes to initialize the PRNG.  It
> is similar to the traditional way of reading from /dev/urandom, but
> being a system call rather than a special file, it doesn't suffer from
> problems like running out of file descriptors, or failure when running
> in a container where /dev/urandom is not available.

I don't understand why this is useful.  

getrandom, and /dev/random, are for strong (secure) RNGs.  A PRNG is something 
entirely different.  By saying we use entropy to seed it, we blur the 
distinction and create the false impression that the PRNG has security 
properties.

It would be better to initialize with something more obviously insecure, like 
gettimeofday().

paul




Re: [patch] improve internals documentation for nested function descriptors

2018-07-27 Thread Paul Koning



> On Jul 27, 2018, at 4:39 PM, Sandra Loosemore  wrote:
> 
> Apropos of the discussion about improving the docs for 
> TARGET_CUSTOM_FUNCTION_DESCRIPTORS in the context of the C-SKY port 
> submission,
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01454.html
> 
> here is the patch I've come up with based on reading the source.  Is this 
> technically accurate?  Any suggestions on how to improve it further?
> 
> -Sandra
> 

Nice.  That tells me a lot more, so I hope to hear confirmation it's accurate.

A nit: "word aligned" -- you used thas as a synonym for "multiple of 4 aligned" 
but some of us have 2-byte words.

On the function descriptor concept generally: it wasn't clear to me whether a 
target that wants to use this mechanism needs to do anything to tell GCC how to 
construct descriptors.  Does GCC have a standard recipe it uses that "should" 
work for any target?  I assume so because there aren't any other target hooks 
or special patterns for this.  It would be helpful to say so explicitly.

paul



Re: [PATCH 09/11] pdp11 - example of a port not needing a speculation barrier

2018-07-27 Thread Paul Koning



> On Jul 27, 2018, at 5:37 AM, Richard Earnshaw  
> wrote:
> 
> 
> This patch is intended as an example of all that is needed if the
> target system doesn't support CPUs that have speculative execution.
> I've chosen the pdp11 port on the basis that it's old enough that this
> is likely to be true for all existing implementations and that there
> is also little chance of that changing in future!
> 
>   * config/pdp11/pdp11.c (TARGET_HAVE_SPECULATION_SAFE_VALUE): Redefine
>   to speculation_safe_value_not_needed.
> ---
> gcc/config/pdp11/pdp11.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> <0009-pdp11-example-of-a-port-not-needing-a-speculation-ba.patch>

Correct, no speculative instruction now, and I don't think any of the people 
constructing PDP11s (yes there are some) are going to be adding it.

Thanks Richard.

paul



Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-26 Thread Paul Koning



> On Jul 26, 2018, at 7:34 PM, Joseph Myers  wrote:
> 
> On Wed, 25 Jul 2018, Richard Earnshaw (lists) wrote:
> 
 Port maintainers DO need to decide what to do about speculation, even if
 it is explicitly that no mitigation is needed.
>>> 
>>> Agreed.  But I didn't yet see a request for maintainers to decide that?
>>> 
>> 
>> consider it made, then :-)
> 
> I suggest the following as an appropriate process for anything needing 
> attention from architecture maintainers:
> 
> * Send a message to the gcc list, starting its own thread, CC:ed to all 
> target architecture maintainers, stating explicitly in its first sentence 
> that it is about something needing action from all such maintainers.

Yes, because it was not clear to me that a patch discussion about a speculation 
builtin was something that every target maintainer was supposed to look at.  
"Speculation" is not a term that shows up in my target...

> ...
> * Over the next few months, send occasional reminders, each including a 
> list of the ports that have not been updated.

Would the GCC Wiki be a good place to collect all the responses and track what 
is still open?  If not, what is a good way to do the tracking?

paul



Re: [2/5] C-SKY port: Backend implementation

2018-07-25 Thread Paul Koning



> On Jul 25, 2018, at 12:50 AM, Jeff Law  wrote:
> 
 ...
>>> It did.  See TARGET_CUSTOM_FUNCTION_DESCRIPTORS and the (relatively few)
>>> ports that define it.
>> 
>> Hmmm, I completely failed to make that connection from the docs -- the
>> whole description of that hook is pretty gibberishy and I thought it was
>> something for targets where the ABI already specifies some "standard
>> calling sequence" using descriptors (C-SKY doesn't), rather than a
>> generic alternative to executable trampolines.  Putting on my doc
>> maintainer hat briefly, I can see this needs a lot of work.  :-(
> Most likely :-)  So many things to do, so little time.
> 
> 
>> 
>> Anyway, is this required for new ports nowadays?  If so, I at least know
>> what to search for now.  At this point I couldn't say whether this would
>> do anything to fix the situation on ck801 targets where there simply
>> aren't enough spare registers available to the trampoline to both hold
>> the static link and do an indirect jump.
> It's not required, but preferred, particularly if the part has an MMU
> that can provide no-execute protections on pages in memory.  If the
> target doesn't have an mmu, then it's of marginal value.
> 
> The key advantage it has over the old trampoline implementation is that
> stacks can remain non-executable, even for Ada and nested functions.
> That's a big win from a security standpoint.

Non-executable stacks are a very good thing.

That said, I also looked at the target hook documentation and was left without 
any clue whatsoever.  It sure isn't clear what powers of two have to do with 
descriptors, or what descriptors have to do with support for nested functions.

Can you suggest places to look to get an understanding of this feature?  It 
sounds like the only other option is "Use the source, Luke".  Any specific 
targets that make a good reference implementation for this?

paul




Re: That light at the end of the tunnel?

2018-07-23 Thread Paul Koning



> On Jul 23, 2018, at 12:21 PM, Jeff Law  wrote:
> 
>> 
> Hell, I'd live with doing a "reasonable effort" for the vast majority of
> our branches.  Other than the trunk, active release branches and a few
> active development branches I don't think we really care about 'em.
> 
> jeff

There are two approaches to conversion: (1) convert what's active and preserve 
the old system indefinitely for reference access; (2) convert everything 100% 
so the old system can be retired.

It seems that Eric has been trying for #2, which is fine if doable.  But #1 is 
also a reasonable option and if the nature of the beast makes #2 acceptable, 
going for #1 is a plan I would definitely support.

paul



Re: ICE building a libsupc++ file, pdp11 target

2018-07-23 Thread Paul Koning



> On Jul 23, 2018, at 10:21 AM, Joseph Myers  wrote:
> 
> On Tue, 17 Jul 2018, Paul Koning wrote:
> 
>> That reveals some things but nothing jumps out at me.  However... pdp11 
>> is an a.out target, not an ELF target.  Would that explain the problem?  
>> If yes, is there a workaround (short of implementing ELF)?
> 
> As there are hardly any targets left without named section support, using 
> ELF might be a good idea so you don't have to deal with the 
> no-named-sections issues.
> 
> The ELF e_machine value EM_PDP11 was assigned to Lars Brinkoff, 
> l...@nocrew.org, 30 May 2002, according to the comments in 
> ch4.eheader.html.  I don't know if an actual ELF ABI has been defined.

I don't know of pdp11 ELF code in binutils.  

The named-section stuff itself doesn't seem to be directly related.  If I run 
the test with the target flag -mdec-asm, it still fails.  That mode does 
support named sections.

I can easily see the issue with the debugger and compare with a target that 
works (vax).  So I should be able to find this, at least once I figure out how 
to turn off address space randomization on my host.

paul



Re: [RFC] Adding Python as a possible language and it's usage

2018-07-20 Thread Paul Koning



> On Jul 20, 2018, at 12:37 PM, Segher Boessenkool  
> wrote:
> 
> On Fri, Jul 20, 2018 at 11:49:05AM +0200, Martin Liška wrote:
>> Fully agree with that. Coming up with a new scripts written in python2 really
>> makes no sense.
> 
> Then python cannot be a build requirement for GCC, since some of our
> primary targets do not ship python3.

Is it required that GCC must build with only the stock support elements on the 
primary target platforms?  Or is it allowed to require installing 
prerequisites?  Yes, some platforms are so far behind they still don't ship 
Python 3, but installing it is straightforward.

paul



Re: Detecting superfluous "else"

2018-07-19 Thread Paul Koning



> On Jul 19, 2018, at 4:49 AM, U.Mutlu  wrote:
> 
> Hi,
> it makes me 'crazy' when I see such if-else constructs:
>  if (x)
>return 7;
>  else
>return 4;
> 
> (Of course in this case one better would use the shorthand "return x ? 7 : 
> 4;", but that's not the issue here)
> 
> The 'else' is obviously superfluous/redundant, ie. unneeded at all:
>  if (x)
>return 7;
>  return 4;
> 
> Is it possible to warn about such unneccessary occurances of "else"?
> If not, then I suggest to add a new warning code -Wsuperfluous-else or 
> -Wredundant-else or so.

I don't see any reason to warn about that code.  It's perfectly valid, and in 
my view is clearer than the alternative.  I've written both but I most often 
write the "else" variant for the reason that it expresses the semantics 
explicitly.

Warnings are appropriate for code that is known to be a source of bugs, or 
where there is a reasonable chance that the intent of the programmer doesn't 
match what was actually written.  That's not the case here.

paul



Re: [RFC] Adding Python as a possible language and it's usage

2018-07-18 Thread Paul Koning



> On Jul 18, 2018, at 1:22 PM, Boris Kolpackov  wrote:
> 
> Paul Koning  writes:
> 
>>> On Jul 18, 2018, at 11:13 AM, Boris Kolpackov  
>>> wrote:
>>> 
>>> I wonder what will be the expected way to obtain a suitable version of
>>> Python if one is not available on the build machine? With awk I can
>>> build it from source pretty much anywhere. Is building newer versions
>>> of Python on older targets a similarly straightforward process (somehow
>>> I doubt it)? What about Windows?
>> 
>> It's the same sort of thing: untar the sources, configure, make, make
>> install.
> 
> Will this also install all the Python packages one might plausible want
> to use in GCC?

It installs the entire standard Python library (corresponding to the 1800+ 
pages of the library manual).  I expect that will easily cover anything GCC 
might want to do.

paul



Re: [RFC] Adding Python as a possible language and it's usage

2018-07-18 Thread Paul Koning



> On Jul 18, 2018, at 11:13 AM, Boris Kolpackov  wrote:
> 
> On Tue, 2018-07-17 at 14:49 +0200, Martin Liška wrote:
> 
>> My question is simple: can we starting using a scripting language like
>> Python and replace usage of the AWK scripts?
> 
> I wonder what will be the expected way to obtain a suitable version of
> Python if one is not available on the build machine? With awk I can
> build it from source pretty much anywhere. Is building newer versions
> of Python on older targets a similarly straightforward process (somehow
> I doubt it)? What about Windows?

It's the same sort of thing: untar the sources, configure, make, make install.  
The code is larger than awk but the process is no more difficult.

For Windows there are pre-built kits.  Ditto for a number of other popular 
operating systems.

paul



Re: [RFC] Adding Python as a possible language and it's usage

2018-07-17 Thread Paul Koning



> On Jul 17, 2018, at 8:23 PM, David Malcolm  wrote:
> 
>>> Hi.
>>> 
>>> I've recently touched AWK option generate machinery and it's quite
>>> unpleasant to make any adjustments. My question is simple: can we
>>> starting using a scripting language like Python and replace usage
>>> of
>>> the AWK scripts? It's probably question for Steering committee, but
>>> I
>>> would like to see feedback from community
>>> I'm looking forward to a feedback.
>>> Martin

David gave a number of good arguments.  I support Martin's proposal, both as to 
replacing AWK and specifically the choice of Python for that purpose.

Python fits the bill very well in my experience.  I've used it to write several 
large programs, including such non-obvious ones as two network protocol stack 
implementations.

In roughly 40 years, and roughly 40 programming languages, I've only twice 
encountered a language where I could go from knowing nothing at all to writing 
a substantial real world program in just one week: Pascal (in college) and 
Python (about 15 years ago).  This is why Python became my language of choice 
whenever I don't need the speed or small memory footprint of C/C++.

paul




Re: ICE building a libsupc++ file, pdp11 target

2018-07-17 Thread Paul Koning



> On Jul 17, 2018, at 9:38 AM, Richard Biener  
> wrote:
> 
>> ...
>> lldb? eh ... ;)

Yes, gdb is hard to make work on Mac OS.

>> anyhow, this is
>> 
>> namespace std
>> {
>> 
>> # 56 
>> "/Users/pkoning/Documents/svn/buildpdp/pdp11-aout/libstdc++-v3/include/type_traits"
>> 3
>>  template
>>struct integral_constant
>>{
>>  static constexpr _Tp value = __v;
>> ^^^
>> 
>> which should have TREE_PUBLIC set.  My next step would be to watch how
>> this flag changes (if it does...)
>> 
>> break at ggc-page.c:1442 (the return stmt of ggc_internal_alloc)
>> conditional on result == 0x14269c750
>> and then watch *>base.public_flag printing said flag when
>> the watchpoint hits
>> (because you're watching the whole integer containing the bitfield bit).
>> 
>> If that doesn't go anywhere try reducing the source file using creduce
>> or by other means.
>> 
>> Maybe look at reset_decl_linkage () and visibility support in general.
> 
> Oh, and generally a grep for 'targetm' in cp/ might reveal bits that sound
> interesting.

That reveals some things but nothing jumps out at me.  However... pdp11 is an 
a.out target, not an ELF target.  Would that explain the problem?  If yes, is 
there a workaround (short of implementing ELF)?

paul




Re: ICE building a libsupc++ file, pdp11 target

2018-07-16 Thread Paul Koning



> On Jul 13, 2018, at 3:12 PM, U.Mutlu  wrote:
> 
> Paul Koning wrote on 07/13/2018 08:56 PM:
>> 
>> 
>>> On Jul 13, 2018, at 2:52 PM, U.Mutlu  wrote:
>>> 
>>> Paul Koning wrote on 07/13/2018 08:27 PM:
>>>> I'm trying to see if I can build the pdp11 target for languages other than 
>>>> just C, and the answer is for the most part yes.  But I' running into an 
>>>> ICE I can't figure out.  It's way before the back end comes into the 
>>>> picture as far as I can see, and there's nothing particularly strange 
>>>> looking in the input file that suggests anything.
>>>> 
>>>> Any suggestions on where to look?  The failure is:
>>>> 
>>>> libtool: compile:  /Users/pkoning/Documents/svn/buildpdp+/./gcc/xgcc 
>>>> -shared-libgcc -B/Users/pkoning/Documents/svn/buildpdp+/./gcc -nostdinc++ 
>>>> -L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/src 
>>>> -L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/src/.libs 
>>>> -L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/libsupc++/.libs
>>>>  -B/usr/local/pdp11-aout/pdp11-aout/bin/ 
>>>> -B/usr/local/pdp11-aout/pdp11-aout/lib/ -isystem 
>>>> /usr/local/pdp11-aout/pdp11-aout/include -isystem 
>>>> /usr/local/pdp11-aout/pdp11-aout/sys-include 
>>>> -I/Users/pkoning/Documents/svn/gcc/libstdc++-v3/../libgcc 
>>>> -I/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/include/pdp11-aout
>>>>  -I/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/include 
>>>> -I/Users/pkoning/Documents/svn/gcc/libstdc++-v3/libsupc++ 
>>>> -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
>>>> -fdiagnostics-show-location=once -frandom-seed=new_opa.lo -g -O2 
>>>> -std=gnu++1z -c ../../../../gcc/l
> ib
>>> stdc++-v3/libsupc++/new_opa.cc -o new_opa.o
>>>> cc1plus: warning: -Wabi won't warn about anything [-Wabi]
>>>> cc1plus: note: -Wabi warns about differences from the most up-to-date ABI, 
>>>> which is also used by default
>>>> cc1plus: note: use e.g. -Wabi=11 to warn about changes from GCC 7
>>>> ../../../../gcc/libstdc++-v3/libsupc++/new_opa.cc:112:1: internal compiler 
>>>> error: in import_export_decl, at cp/decl2.c:2877
>>>>  }
>>>>  ^
>>>> libbacktrace could not find executable to open
>>>> Please submit a full bug report,
>>>> with preprocessed source if appropriate.
>>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>> make[3]: *** [new_opa.lo] Error 1
>>> 
>>> 
>>> It's failing at the last gcc_assert() below (file 
>>> ../gcc_src/gcc/cp/decl2.c:2877 ):
>> 
>> Sorry, I should have been more explicit.  I saw that, but my question is: 
>> what is the underlying problem that triggers the assert?  The comment is not 
>> much help to me.  And more specifically: what could a target be doing wrong 
>> that would make an early stage of the compiler fail like this on what seems 
>> like a pretty straightforward source file?
>> 
>> Many of the other libstdc++ bits compile just fine, as do plenty of 
>> testsuite cases and some test files of my own.
>> 
> 
> /* In a VAR_DECL, FUNCTION_DECL, NAMESPACE_DECL or TYPE_DECL,
>   nonzero means name is to be accessible from outside this translation unit.
>   In an IDENTIFIER_NODE, nonzero means an external declaration
>   accessible from outside this translation unit was previously seen
>   for this name in an inner scope.  */
> #define TREE_PUBLIC(NODE) ((NODE)->base.public_flag)
> 
> 
> Ie. it has todo with the value of the member var public_flag of the tree decl.

I'm still on the same question as before.  Why do I get an ICE in the tree 
phase of the compiler, complaining about flags of a declaration, based on 
something I apparently have wrong in my target description?  I can build C++ 
for other targets, so this isn't a general bug.  But I'm not used to target 
(back end) stuff affecting the compiler before I even get to the RTL part.  And 
in this case, looking at the failing code gives me no clue at all.  I can't do 
anything with the "tree" object to find out what it describes; it's a VAR_DECL 
but I don't know what to look at. I tried turning on tree dump files, those 
gave no clue either.  And there is nothing in the manuals.

paul



[PATCH, committed] Fix rtl check error in pdp11

2018-07-14 Thread Paul Koning
When building with checking enabled, there were check failures in 
pdp11_rtx_costs.  This patch fixes two errors.

Committed.

paul

ChangeLog:

2018-07-14  Paul Koning  

* config/pdp11/pdp11.c (pdp11_rtx_costs): Bugfixes.

Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 262604)
+++ config/pdp11/pdp11.c(working copy)
@@ -1014,6 +1014,7 @@ pdp11_rtx_costs (rtx x, machine_mode mode, int out
   const int code = GET_CODE (x);
   const int asize = (mode == QImode) ? 2 : GET_MODE_SIZE (mode);
   rtx src, dest;
+  const char *fmt;
   
   switch (code)
 {
@@ -1035,6 +1036,14 @@ pdp11_rtx_costs (rtx x, machine_mode mode, int out
   *total = pdp11_addr_cost (x, mode, ADDR_SPACE_GENERIC, speed);
   return true;
 }
+  if (GET_RTX_LENGTH (code) == 0)
+{
+  if (speed)
+   *total = 0;
+  else
+   *total = 2;
+  return true;
+}
 
   /* Pick up source and dest.  We don't necessarily use the standard
  recursion in rtx_costs to figure the cost, because that would
@@ -1041,6 +1050,15 @@ pdp11_rtx_costs (rtx x, machine_mode mode, int out
  count the destination operand twice for three-operand insns.
  Also, this way we can catch special cases like move of zero, or
  add one.  */
+  fmt = GET_RTX_FORMAT (code);
+  if (fmt[0] != 'e' || (GET_RTX_LENGTH (code) > 1 && fmt[1] != 'e'))
+{
+  if (speed)
+   *total = 0;
+  else
+   *total = 2;
+  return true;
+}
   if (GET_RTX_LENGTH (code) > 1)
 src = XEXP (x, 1);
   dest = XEXP (x, 0);



Re: ICE building a libsupc++ file, pdp11 target

2018-07-13 Thread Paul Koning



> On Jul 13, 2018, at 2:52 PM, U.Mutlu  wrote:
> 
> Paul Koning wrote on 07/13/2018 08:27 PM:
>> I'm trying to see if I can build the pdp11 target for languages other than 
>> just C, and the answer is for the most part yes.  But I' running into an ICE 
>> I can't figure out.  It's way before the back end comes into the picture as 
>> far as I can see, and there's nothing particularly strange looking in the 
>> input file that suggests anything.
>> 
>> Any suggestions on where to look?  The failure is:
>> 
>> libtool: compile:  /Users/pkoning/Documents/svn/buildpdp+/./gcc/xgcc 
>> -shared-libgcc -B/Users/pkoning/Documents/svn/buildpdp+/./gcc -nostdinc++ 
>> -L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/src 
>> -L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/src/.libs 
>> -L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/libsupc++/.libs
>>  -B/usr/local/pdp11-aout/pdp11-aout/bin/ 
>> -B/usr/local/pdp11-aout/pdp11-aout/lib/ -isystem 
>> /usr/local/pdp11-aout/pdp11-aout/include -isystem 
>> /usr/local/pdp11-aout/pdp11-aout/sys-include 
>> -I/Users/pkoning/Documents/svn/gcc/libstdc++-v3/../libgcc 
>> -I/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/include/pdp11-aout
>>  -I/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/include 
>> -I/Users/pkoning/Documents/svn/gcc/libstdc++-v3/libsupc++ 
>> -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
>> -fdiagnostics-show-location=once -frandom-seed=new_opa.lo -g -O2 
>> -std=gnu++1z -c ../../../../gcc/lib
> stdc++-v3/libsupc++/new_opa.cc -o new_opa.o
>> cc1plus: warning: -Wabi won't warn about anything [-Wabi]
>> cc1plus: note: -Wabi warns about differences from the most up-to-date ABI, 
>> which is also used by default
>> cc1plus: note: use e.g. -Wabi=11 to warn about changes from GCC 7
>> ../../../../gcc/libstdc++-v3/libsupc++/new_opa.cc:112:1: internal compiler 
>> error: in import_export_decl, at cp/decl2.c:2877
>>  }
>>  ^
>> libbacktrace could not find executable to open
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> See <https://gcc.gnu.org/bugs/> for instructions.
>> make[3]: *** [new_opa.lo] Error 1
> 
> 
> It's failing at the last gcc_assert() below (file 
> ../gcc_src/gcc/cp/decl2.c:2877 ):

Sorry, I should have been more explicit.  I saw that, but my question is: what 
is the underlying problem that triggers the assert?  The comment is not much 
help to me.  And more specifically: what could a target be doing wrong that 
would make an early stage of the compiler fail like this on what seems like a 
pretty straightforward source file?

Many of the other libstdc++ bits compile just fine, as do plenty of testsuite 
cases and some test files of my own.

paul



ICE building a libsupc++ file, pdp11 target

2018-07-13 Thread Paul Koning
I'm trying to see if I can build the pdp11 target for languages other than just 
C, and the answer is for the most part yes.  But I' running into an ICE I can't 
figure out.  It's way before the back end comes into the picture as far as I 
can see, and there's nothing particularly strange looking in the input file 
that suggests anything.

Any suggestions on where to look?  The failure is:

libtool: compile:  /Users/pkoning/Documents/svn/buildpdp+/./gcc/xgcc 
-shared-libgcc -B/Users/pkoning/Documents/svn/buildpdp+/./gcc -nostdinc++ 
-L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/src 
-L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/src/.libs 
-L/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/libsupc++/.libs
 -B/usr/local/pdp11-aout/pdp11-aout/bin/ 
-B/usr/local/pdp11-aout/pdp11-aout/lib/ -isystem 
/usr/local/pdp11-aout/pdp11-aout/include -isystem 
/usr/local/pdp11-aout/pdp11-aout/sys-include 
-I/Users/pkoning/Documents/svn/gcc/libstdc++-v3/../libgcc 
-I/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/include/pdp11-aout
 -I/Users/pkoning/Documents/svn/buildpdp+/pdp11-aout/libstdc++-v3/include 
-I/Users/pkoning/Documents/svn/gcc/libstdc++-v3/libsupc++ 
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
-fdiagnostics-show-location=once -frandom-seed=new_opa.lo -g -O2 -std=gnu++1z 
-c ../../../../gcc/libstdc++-v3/libsupc++/new_opa.cc -o new_opa.o
cc1plus: warning: -Wabi won't warn about anything [-Wabi]
cc1plus: note: -Wabi warns about differences from the most up-to-date ABI, 
which is also used by default
cc1plus: note: use e.g. -Wabi=11 to warn about changes from GCC 7
../../../../gcc/libstdc++-v3/libsupc++/new_opa.cc:112:1: internal compiler 
error: in import_export_decl, at cp/decl2.c:2877
 }
 ^
libbacktrace could not find executable to open
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[3]: *** [new_opa.lo] Error 1

paul



Re: How does a target make Fortran work?

2018-07-13 Thread Paul Koning



> On Jul 13, 2018, at 9:36 AM, Janus Weil  wrote:
> 
> Hi Paul,
> 
> Fortran problems are best discussed on fort...@gcc.gnu.org 
> <mailto:fort...@gcc.gnu.org> (in CC) ...
> 
> 2018-07-12 21:22 GMT+02:00 Paul Koning  <mailto:paulkon...@comcast.net>>:
>> I tried to rebuild for target pdp11 with fortran enabled (in the past I've 
>> just enabled C).  It builds fine but the resulting compiler crashes at 
>> startup:
>> 
>> Paul-Konings-MacBook-Pro:gcc pkoning$ ./xgcc -B. -O2 -S ../../hello.f
>> f951: internal compiler error: gfc_validate_kind(): Got bad kind
>> libbacktrace could not find executable to open
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> See <https://gcc.gnu.org/bugs/> for instructions.
>> 
>> "hello.f" is the typical "hello world" program.  Some debugging got me to 
>> here:
>> 
>> * thread #1, queue = 'com.apple.main-thread', stop reason = step over
>>frame #0: 0x0001001c1c93 f951`gfc_init_kinds() [inlined] 
>> gfc_validate_kind(type=BT_REAL, kind=, may_fail=false) at 
>> trans-types.c:814 [opt]
>>   811  }
>>   812
>>   813if (rc < 0 && !may_fail)
>> -> 814  gfc_internal_error ("gfc_validate_kind(): Got bad kind");
>>   815
>>   816return rc;
>>   817  }
>> Target 0: (f951) stopped.
>> 
>> So it's unhappy about some "kind", for BT_REAL.  I'm not sure what that 
>> means.  Is it mapping the available data types to Fortran "KIND" values?  If 
>> so, is there something the target has to do for this to work?  Note that 
>> this isn't an IEEE target, so if the initialization is expecting IEEE float 
>> then that may account for it.  But I have no idea what to do here, and the 
>> manual offers no clue.
> 
> A "kind" in Fortran is the size of the data type in bytes. E.g.
> "real(kind=4)" would be a 'single-precision' 32-bit float number,
> while "real(kind=8)" would be a 'double-precision' 64-bit float
> number. The default kind for REAL variables is 4 in gfortran, see:
> 
> https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gfortran/KIND-Type-Parameters.html 
> <https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gfortran/KIND-Type-Parameters.html>
> 
> Since pdp-11 seems to be a 16-bit platform, such a 32-bit
> floating-point type might not be available there? Possibly it makes
> sense to set the default REAL kind to 2 for such a target?

No, because there isn't a 16 bit float type on that machine.  But your answer 
pointed me straight to the problem.

The pdp11 back end has a command line option to choose whether the C "float" 
type is SFmode or DFmode (32 bit or 64 bit).  For various reasons, it defaults 
to 64, which of course means that "float" and "double" are the same machine 
type -- the only float is "real(kind=8)".  If I override the default with 
-mfloat32, Fortran is happy.  Maybe the right answer is to change the default, 
at least once I take away the historic reason for it.

Thanks!

paul



[PATCH, committed] Fix typo in pdp11 target

2018-07-12 Thread Paul Koning
This fixes a typo in the output of a ".set" directive.

Committed.

paul

ChangeLog:

2018-07-12  Paul Koning  

* config/pdp11/pdp11.c (pdp11_output_def): Fix typo in .set
directive.

Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 262602)
+++ config/pdp11/pdp11.c(working copy)
@@ -2437,7 +2437,7 @@ pdp11_output_def (FILE *file, const char *label1,
 }
   else
 {
-  fputs (".set", file);
+  fputs ("\t.set\t", file);
   assemble_name (file, label1);
   putc (',', file);
   assemble_name (file, label2);



How does a target make Fortran work?

2018-07-12 Thread Paul Koning
I tried to rebuild for target pdp11 with fortran enabled (in the past I've just 
enabled C).  It builds fine but the resulting compiler crashes at startup:

Paul-Konings-MacBook-Pro:gcc pkoning$ ./xgcc -B. -O2 -S ../../hello.f
f951: internal compiler error: gfc_validate_kind(): Got bad kind
libbacktrace could not find executable to open
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

"hello.f" is the typical "hello world" program.  Some debugging got me to here:

* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0001001c1c93 f951`gfc_init_kinds() [inlined] 
gfc_validate_kind(type=BT_REAL, kind=, may_fail=false) at 
trans-types.c:814 [opt]
   811  }
   812  
   813if (rc < 0 && !may_fail)
-> 814  gfc_internal_error ("gfc_validate_kind(): Got bad kind");
   815  
   816return rc;
   817  }
Target 0: (f951) stopped.

So it's unhappy about some "kind", for BT_REAL.  I'm not sure what that means.  
Is it mapping the available data types to Fortran "KIND" values?  If so, is 
there something the target has to do for this to work?  Note that this isn't an 
IEEE target, so if the initialization is expecting IEEE float then that may 
account for it.  But I have no idea what to do here, and the manual offers no 
clue.

paul



Re: [PATCH] doc: update looping constructs

2018-07-12 Thread Paul Koning



> On Jul 12, 2018, at 1:42 PM, Jeff Law  wrote:
> 
> On 07/12/2018 11:17 AM, Paul Koning wrote:
>> 
>> 
>>> On Jul 12, 2018, at 12:55 PM, Jeff Law  wrote:
>>> 
>>> On 07/11/2018 06:20 PM, Paul Koning wrote:
>>>> This patch removes the obsolete documentation for
>>>> decrement_and_branch_until_zero.  It also adds detail to the
>>>> description for doloop_end.  In particular, it describes the
>>>> required form of the conditional branch part of the pattern.
>>>> 
>>>> Ok for trunk?
>>>> 
>>>> paul
>>>> 
>>>> ChangeLog:
>>>> 
>>>> 2018-07-11  Paul Koning  
>>>> 
>>>> * doc/rtl.texi (REG_NONNEG): Remove decrement and branch until 
>>>> zero reference, add doloop_end instead. * doc/md.texi
>>>> (decrement_and_branch_until_zero): Remove. (Looping patterns):
>>>> Remove decrement_and_branch_until_zero.  Add detail for
>>>> doloop_end.
>>> OK.  I wonder if we can eliminate REG_NONNEG at this point.  I'm
>>> not sure if it's really being used anymore.  I see a reference in
>>> the old m68k dbra pattern, but that pattern could well be dead at
>>> this point.
>> 
>> The original reasoning is that you'd need the note if you have a
>> machine instruction that does a loop until negative, and you want to
>> confirm that the input is a valid loop count (not negative --
>> otherwise you'd loop once rather than zero times).  If you have a
>> loop instruction that works this way, that still matters.  If someone
>> wants to convert m68000 to use doloop_end, for example.  Or likewise
>> for vax, which has an unnamed looping pattern that could be used but
>> obviously isn't at the moment (since it's unnamed).
> Right.  And that is now dbra works on the m68k.  However I've got little
> confidence we're generating that instruction anymore.  Though I guess
> it's easy enough to check since I can bootstrap m68k with qemu :-)
> 
> It's always struck me as lame that we had to rely on the magic note and
> in the world of doloop.c we ought to be able to express things much more
> clearly.

Agreed.

I'm pretty sure dbra is no longer working.  It certainly didn't do anything 
when I tried to add it in pdp11, and when I asked about it on the gcc list I 
was told that it was removed a decade ago, or thereabouts.

The new technique works fine.  The only drawback is the "register can't be the 
induction variable" limitation, which makes sense if special registers are used 
but would be nice not to have on machines where the register used is a general 
register.

>> Then again... does the code that generates this insn do the checking,
>> i.e., avoid generating doloop_end unless it can confirm that the
>> register is nonnegative?  It doesn't seem to, at least not the code
>> in "doloop_modify" that actually inserts the regnote.  But how can
>> that be valid?  doloop_end by definition loops at least once.  If you
>> pass it a negative count, the loop is run once for "ge" comparisons,
>> and MAXUINT + count times for insns that use the "ne" condition
>> (i.e., all the current ones).  So one would assume that's not
>> possible, i.e., you couldn't come through there unless REG_NONNEG is
>> guaranteed true.
>> 
>> Currently the only doloop_end insns I can find all have the "ne"
>> comparison so the regnote isn't generated.  I'd like to leave this
>> question separate from the doc cleanup.  If we want to remove the
>> note, that's easy enough, and the doc change would be pretty obvious
>> as well.
>> 
>> So is this patch OK to commit?
> SOrry I wasn't clear.  I'm perfectly content to look at killing
> REG_NONNEG independent of the doc cleanup.
> 
> The doc patch is fine to commit now.

Thanks, done.

paul




Re: [PATCH] doc: update looping constructs

2018-07-12 Thread Paul Koning



> On Jul 12, 2018, at 12:55 PM, Jeff Law  wrote:
> 
> On 07/11/2018 06:20 PM, Paul Koning wrote:
>> This patch removes the obsolete documentation for 
>> decrement_and_branch_until_zero.  It also adds detail to the description for 
>> doloop_end.  In particular, it describes the required form of the 
>> conditional branch part of the pattern.
>> 
>> Ok for trunk?
>> 
>>  paul
>> 
>> ChangeLog:
>> 
>> 2018-07-11  Paul Koning  
>> 
>>  * doc/rtl.texi (REG_NONNEG): Remove decrement and branch until
>>  zero reference, add doloop_end instead.
>>  * doc/md.texi (decrement_and_branch_until_zero): Remove.
>>  (Looping patterns): Remove decrement_and_branch_until_zero.  Add
>>  detail for doloop_end.
> OK.  I wonder if we can eliminate REG_NONNEG at this point.  I'm not
> sure if it's really being used anymore.  I see a reference in the old
> m68k dbra pattern, but that pattern could well be dead at this point.

The original reasoning is that you'd need the note if you have a machine 
instruction that does a loop until negative, and you want to confirm that the 
input is a valid loop count (not negative -- otherwise you'd loop once rather 
than zero times).  If you have a loop instruction that works this way, that 
still matters.  If someone wants to convert m68000 to use doloop_end, for 
example.  Or likewise for vax, which has an unnamed looping pattern that could 
be used but obviously isn't at the moment (since it's unnamed).

Then again... does the code that generates this insn do the checking, i.e., 
avoid generating doloop_end unless it can confirm that the register is 
nonnegative?  It doesn't seem to, at least not the code in "doloop_modify" that 
actually inserts the regnote.  But how can that be valid?  doloop_end by 
definition loops at least once.  If you pass it a negative count, the loop is 
run once for "ge" comparisons, and MAXUINT + count times for insns that use the 
"ne" condition (i.e., all the current ones).  So one would assume that's not 
possible, i.e., you couldn't come through there unless REG_NONNEG is guaranteed 
true.

Currently the only doloop_end insns I can find all have the "ne" comparison so 
the regnote isn't generated.  I'd like to leave this question separate from the 
doc cleanup.  If we want to remove the note, that's easy enough, and the doc 
change would be pretty obvious as well.

So is this patch OK to commit?

paul



[PATCH] doc: update looping constructs

2018-07-11 Thread Paul Koning
This patch removes the obsolete documentation for 
decrement_and_branch_until_zero.  It also adds detail to the description for 
doloop_end.  In particular, it describes the required form of the conditional 
branch part of the pattern.

Ok for trunk?

paul

ChangeLog:

2018-07-11  Paul Koning  

* doc/rtl.texi (REG_NONNEG): Remove decrement and branch until
zero reference, add doloop_end instead.
* doc/md.texi (decrement_and_branch_until_zero): Remove.
(Looping patterns): Remove decrement_and_branch_until_zero.  Add
detail for doloop_end.

Index: doc/md.texi
===
--- doc/md.texi (revision 262562)
+++ doc/md.texi (working copy)
@@ -6681,17 +6681,6 @@ second operand, but you should incorporate it in t
 that the jump optimizer will not delete the table as unreachable code.
 
 
-@cindex @code{decrement_and_branch_until_zero} instruction pattern
-@item @samp{decrement_and_branch_until_zero}
-Conditional branch instruction that decrements a register and
-jumps if the register is nonzero.  Operand 0 is the register to
-decrement and test; operand 1 is the label to jump to if the
-register is nonzero.  @xref{Looping Patterns}.
-
-This optional instruction pattern is only used by the combiner,
-typically for loops reversed by the loop optimizer when strength
-reduction is enabled.
-
 @cindex @code{doloop_end} instruction pattern
 @item @samp{doloop_end}
 Conditional branch instruction that decrements a register and
@@ -7515,67 +7504,12 @@ iterations.  This avoids the need for fetching and
 @samp{dbra}-like instruction and avoids pipeline stalls associated with
 the jump.
 
-GCC has three special named patterns to support low overhead looping.
-They are @samp{decrement_and_branch_until_zero}, @samp{doloop_begin},
-and @samp{doloop_end}.  The first pattern,
-@samp{decrement_and_branch_until_zero}, is not emitted during RTL
-generation but may be emitted during the instruction combination phase.
-This requires the assistance of the loop optimizer, using information
-collected during strength reduction, to reverse a loop to count down to
-zero.  Some targets also require the loop optimizer to add a
-@code{REG_NONNEG} note to indicate that the iteration count is always
-positive.  This is needed if the target performs a signed loop
-termination test.  For example, the 68000 uses a pattern similar to the
-following for its @code{dbra} instruction:
+GCC has two special named patterns to support low overhead looping.
+They are @samp{doloop_begin} and @samp{doloop_end}.  These are emitted
+by the loop optimizer for certain well-behaved loops with a finite
+number of loop iterations using information collected during strength
+reduction.
 
-@smallexample
-@group
-(define_insn "decrement_and_branch_until_zero"
-  [(set (pc)
-(if_then_else
-  (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am")
-   (const_int -1))
-  (const_int 0))
-  (label_ref (match_operand 1 "" ""))
-  (pc)))
-   (set (match_dup 0)
-(plus:SI (match_dup 0)
- (const_int -1)))]
-  "find_reg_note (insn, REG_NONNEG, 0)"
-  "@dots{}")
-@end group
-@end smallexample
-
-Note that since the insn is both a jump insn and has an output, it must
-deal with its own reloads, hence the `m' constraints.  Also note that
-since this insn is generated by the instruction combination phase
-combining two sequential insns together into an implicit parallel insn,
-the iteration counter needs to be biased by the same amount as the
-decrement operation, in this case @minus{}1.  Note that the following similar
-pattern will not be matched by the combiner.
-
-@smallexample
-@group
-(define_insn "decrement_and_branch_until_zero"
-  [(set (pc)
-(if_then_else
-  (ge (match_operand:SI 0 "general_operand" "+d*am")
-  (const_int 1))
-  (label_ref (match_operand 1 "" ""))
-  (pc)))
-   (set (match_dup 0)
-(plus:SI (match_dup 0)
- (const_int -1)))]
-  "find_reg_note (insn, REG_NONNEG, 0)"
-  "@dots{}")
-@end group
-@end smallexample
-
-The other two special looping patterns, @samp{doloop_begin} and
-@samp{doloop_end}, are emitted by the loop optimizer for certain
-well-behaved loops with a finite number of loop iterations using
-information collected during strength reduction.
-
 The @samp{doloop_end} pattern describes the actual looping instruction
 (or the implicit looping operation) and the @samp{doloop_begin} pattern
 is an optional companion pattern that can be used for initialization
@@ -7594,15 +7528,65 @@ desired special iteration counter register was not
 machine dependent reorg pass could emit a traditional compare and jump
 instruction pair.
 
-The essential differenc

Re: ChangeLog's: do we have to?

2018-07-10 Thread Paul Koning



> On Jul 10, 2018, at 2:18 PM, NightStrike  wrote:
> 
> On Fri, Jul 6, 2018 at 1:17 AM, Siddhesh Poyarekar  
> wrote:
>> ...
>> 
>> We had discussed making addition of ChangeLog entries into the commit
>> message mandatory but the issue there is that commit logs cannot be (or more
>> precisely, should not be) modified after they're pushed so errors in
>> ChangeLog entries will remain.  Carlos had suggested git notes, but I don't
>> know off the top of my head how they work; I vaguely remember them being
>> editable.
> 
> You could do this now under svn, I believe, as svn allows --rev-prop
> changes to modify the log.

Yes, and possibly other rev properties as well, provided the server hook script 
is changed to enable whatever you want to allow.  By default all revprop 
modifications are blocked.

paul



[PATCH] doc: add missing "mode" type attribute

2018-07-10 Thread Paul Koning
"mode" is documented as a variable attribute but not as a type attribute.  This 
fixes that omission.  I simply copied the other text, it seemed suitable as it 
stands.

The attributes are normally listed in alphabetical order but "mode" was out of 
order in the variable attributes.

Ok for trunk?

paul

ChangeLog:

2018-07-10  Paul Koning  

* doc/extend.texi (Common Variable Attributes): Move "mode" into
alphabetical order.
(Common Type Attributes): Add "mode" attribute.

Index: doc/extend.texi
===
--- doc/extend.texi (revision 262540)
+++ doc/extend.texi (working copy)
@@ -6123,6 +6123,19 @@ types (@pxref{Common Function Attributes},
 The message attached to the attribute is affected by the setting of
 the @option{-fmessage-length} option.
 
+@item mode (@var{mode})
+@cindex @code{mode} variable attribute
+This attribute specifies the data type for the declaration---whichever
+type corresponds to the mode @var{mode}.  This in effect lets you
+request an integer or floating-point type according to its width.
+
+@xref{Machine Modes,,, gccint, GNU Compiler Collection (GCC) Internals},
+for a list of the possible keywords for @var{mode}.
+You may also specify a mode of @code{byte} or @code{__byte__} to
+indicate the mode corresponding to a one-byte integer, @code{word} or
+@code{__word__} for the mode of a one-word integer, and @code{pointer}
+or @code{__pointer__} for the mode used to represent pointers.
+
 @item nonstring
 @cindex @code{nonstring} variable attribute
 The @code{nonstring} variable attribute specifies that an object or member
@@ -6158,19 +6171,6 @@ int f (struct Data *pd, const char *s)
 @}
 @end smallexample
 
-@item mode (@var{mode})
-@cindex @code{mode} variable attribute
-This attribute specifies the data type for the declaration---whichever
-type corresponds to the mode @var{mode}.  This in effect lets you
-request an integer or floating-point type according to its width.
-
-@xref{Machine Modes,,, gccint, GNU Compiler Collection (GCC) Internals},
-for a list of the possible keywords for @var{mode}.
-You may also specify a mode of @code{byte} or @code{__byte__} to
-indicate the mode corresponding to a one-byte integer, @code{word} or
-@code{__word__} for the mode of a one-word integer, and @code{pointer}
-or @code{__pointer__} for the mode used to represent pointers.
-
 @item packed
 @cindex @code{packed} variable attribute
 The @code{packed} attribute specifies that a variable or structure field
@@ -7112,6 +7112,19 @@ declaration, the above program would abort when co
 @option{-fstrict-aliasing}, which is on by default at @option{-O2} or
 above.
 
+@item mode (@var{mode})
+@cindex @code{mode} type attribute
+This attribute specifies the data type for the declaration---whichever
+type corresponds to the mode @var{mode}.  This in effect lets you
+request an integer or floating-point type according to its width.
+
+@xref{Machine Modes,,, gccint, GNU Compiler Collection (GCC) Internals},
+for a list of the possible keywords for @var{mode}.
+You may also specify a mode of @code{byte} or @code{__byte__} to
+indicate the mode corresponding to a one-byte integer, @code{word} or
+@code{__word__} for the mode of a one-word integer, and @code{pointer}
+or @code{__pointer__} for the mode used to represent pointers.
+
 @item packed
 @cindex @code{packed} type attribute
 This attribute, attached to @code{struct} or @code{union} type
Index: doc/md.texi
===
--- doc/md.texi (revision 262540)
+++ doc/md.texi (working copy)
@@ -10263,7 +10263,11 @@ the expression from the original pattern, which ma
 @code{match_operand N} from the input pattern.  As a consequence,
 @code{match_dup} cannot be used to point to @code{match_operand}s from
 the output pattern, it should always refer to a @code{match_operand}
-from the input pattern.
+from the input pattern.  If a @code{match_dup N} occurs more than once
+in the output template, its first occurrence is replaced with the
+expression from the original pattern, and the subsequent expressions
+are replaced with @code{match_dup N}, i.e., a reference to the first
+expression.
 
 In the output template one can refer to the expressions from the
 original pattern and create new ones.  For instance, some operands could



[PATCH, committed] Improve code generation for pdp11 target

2018-07-09 Thread Paul Koning
This patch improves the generated code for the pdp11 target.

Committed.

paul

ChangeLog:

2018-07-09  Paul Koning  

* config/pdp11/pdp11.c (pdp11_addr_cost): New function.
(pdp11_insn_cost): New function.
(pdp11_md_asm_adjust): New function.
(TARGET_INVALID_WITHIN_DOLOOP): Define.
(pdp11_rtx_costs): Update to match machine better.
(output_addr_const_pdp11): Correct format mismatch warnings.
* config/pdp11/pdp11.h (SLOW_BYTE_ACCESS): Correct definition.
* config/pdp11/pdp11.md: General change to add base_cost and/or
length attributes for use by new pdp11_insn_cost function.
(MIN_BRANCH): Correct definition.
(MIN_SOB): Ditto.
(doloop_end): Use standard pattern name for looping pattern.
(doloop_end_nocc): New.
(movsf): Add another constraint alternative.
(zero_extendqihi2): Add constraint alternatives for not in place
extend.
(zero_extendhisi2): Remove.
(shift patterns): Add CC handling variants.
(bswaphi2): New.
(bswapsi2): New.
(rothi3): New.
(define_peephole2): New peephole to recognize mov that sets CC for
subsequent test.

Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 262518)
+++ config/pdp11/pdp11.c(working copy)
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "memmodel.h"
 #include "tm_p.h"
 #include "insn-config.h"
+#include "insn-attr.h"
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
@@ -150,6 +151,11 @@ decode_pdp11_d (const struct real_format *fmt ATTR
 static const char *singlemove_string (rtx *);
 static bool pdp11_assemble_integer (rtx, unsigned int, int);
 static bool pdp11_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static int pdp11_addr_cost (rtx, machine_mode, addr_space_t, bool);
+static int pdp11_insn_cost (rtx_insn *insn, bool speed);
+static rtx_insn *pdp11_md_asm_adjust (vec &, vec &,
+ vec &,
+ vec &, HARD_REG_SET &);
 static bool pdp11_return_in_memory (const_tree, const_tree);
 static rtx pdp11_function_value (const_tree, const_tree, bool);
 static rtx pdp11_libcall_value (machine_mode, const_rtx);
@@ -174,6 +180,8 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 #undef TARGET_ASM_INTEGER
 #define TARGET_ASM_INTEGER pdp11_assemble_integer
 
+/* These two apply to Unix and GNU assembler; for DEC, they are
+   overridden during option processing.  */
 #undef TARGET_ASM_OPEN_PAREN
 #define TARGET_ASM_OPEN_PAREN "["
 #undef TARGET_ASM_CLOSE_PAREN
@@ -182,6 +190,15 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS pdp11_rtx_costs
 
+#undef  TARGET_ADDRESS_COST
+#define TARGET_ADDRESS_COST pdp11_addr_cost
+
+#undef  TARGET_INSN_COST
+#define TARGET_INSN_COST pdp11_insn_cost
+
+#undef  TARGET_MD_ASM_ADJUST
+#define TARGET_MD_ASM_ADJUST pdp11_md_asm_adjust
+
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG pdp11_function_arg
 #undef TARGET_FUNCTION_ARG_ADVANCE
@@ -271,6 +288,9 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 
 #undef  TARGET_CAN_CHANGE_MODE_CLASS
 #define TARGET_CAN_CHANGE_MODE_CLASS pdp11_can_change_mode_class
+
+#undef TARGET_INVALID_WITHIN_DOLOOP
+#define TARGET_INVALID_WITHIN_DOLOOP hook_constcharptr_const_rtx_insn_null
 
 /* A helper function to determine if REGNO should be saved in the
current function's stack frame.  */
@@ -968,12 +988,8 @@ pdp11_assemble_integer (rtx x, unsigned int size,
 }
 
 
-/* Register to register moves are cheap if both are general registers.
-   The same is true for FPU, but there we return cost of 3 rather than
-   2 to make reload look at the constraints.  The raeson is that
-   load/store double require extra care since load touches condition
-   codes and store doesn't, which is (partly anyway) described by
-   constraints.  */
+/* Register to register moves are cheap if both are general
+   registers.  */
 static int 
 pdp11_register_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
  reg_class_t c1, reg_class_t c2)
@@ -983,151 +999,270 @@ pdp11_register_move_cost (machine_mode mode ATTRIB
 return 2;
   else if ((c1 >= LOAD_FPU_REGS && c1 <= FPU_REGS && c2 == LOAD_FPU_REGS) ||
   (c2 >= LOAD_FPU_REGS && c2 <= FPU_REGS && c1 == LOAD_FPU_REGS))
-return 3;
+return 2;
   else
 return 22;
 }
 
-
+/* This tries to approximate what pdp11_insn_cost would do, but
+   without visibility into the actual instruction being generated it's
+   inevitably a rough approximation.  */
 static bool
-pdp11_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,

[PATCH, doc] Small clarification on define_subst

2018-07-08 Thread Paul Koning
In doing CCmode work I was confused how define_subst handles cases where the 
same argument appears more than once.  The attached clarifies this.

Ok for trunk?

paul

ChangeLog:

2018-07-08  Paul Koning  

* doc/md.texi (define_subst): Document how multiple occurrences of
the same argument in the replacement pattern are handled.

Index: doc/md.texi
===
--- doc/md.texi (revision 262505)
+++ doc/md.texi (working copy)
@@ -10263,7 +10263,11 @@ the expression from the original pattern, which ma
 @code{match_operand N} from the input pattern.  As a consequence,
 @code{match_dup} cannot be used to point to @code{match_operand}s from
 the output pattern, it should always refer to a @code{match_operand}
-from the input pattern.
+from the input pattern.  If a @code{match_dup N} occurs more than once
+in the output template, its first occurrence is replaced with the
+expression from the original pattern, and the subsequent expressions
+are replaced with @code{match_dup N}, i.e., a reference to the first
+expression.
 
 In the output template one can refer to the expressions from the
 original pattern and create new ones.  For instance, some operands could



Re: [PATCH] doc clarification: DONE and FAIL in define_split and define_peephole2

2018-07-06 Thread Paul Koning



> On Jul 6, 2018, at 12:20 PM, Richard Sandiford  
> wrote:
> 
> Double empty line.
> 
> OK otherwise, thanks.  (Think this counts as a gen* patch.)
> 
> Richard

Thanks.  Committed as shown below.

paul

ChangeLog:

2018-07-06  Paul Koning  

* doc/md.texi (define_split): Document DONE and FAIL.
(define_peephole2): Ditto.

Index: doc/md.texi
===
--- doc/md.texi (revision 262478)
+++ doc/md.texi (working copy)
@@ -8060,6 +8060,30 @@ those in @code{define_expand}, however, these stat
 generate any new pseudo-registers.  Once reload has completed, they also
 must not allocate any space in the stack frame.
 
+There are two special macros defined for use in the preparation statements:
+@code{DONE} and @code{FAIL}.  Use them with a following semicolon,
+as a statement.
+
+@table @code
+
+@findex DONE
+@item DONE
+Use the @code{DONE} macro to end RTL generation for the splitter.  The
+only RTL insns generated as replacement for the matched input insn will
+be those already emitted by explicit calls to @code{emit_insn} within
+the preparation statements; the replacement pattern is not used.
+
+@findex FAIL
+@item FAIL
+Make the @code{define_split} fail on this occasion.  When a @code{define_split}
+fails, it means that the splitter was not truly available for the inputs
+it was given, and the input insn will not be split.
+@end table
+
+If the preparation falls through (invokes neither @code{DONE} nor
+@code{FAIL}), then the @code{define_split} uses the replacement
+template.
+
 Patterns are matched against @var{insn-pattern} in two different
 circumstances.  If an insn needs to be split for delay slot scheduling
 or insn scheduling, the insn is already known to be valid, which means
@@ -8615,6 +8639,33 @@ so here's a silly made-up example:
   "")
 @end smallexample
 
+There are two special macros defined for use in the preparation statements:
+@code{DONE} and @code{FAIL}.  Use them with a following semicolon,
+as a statement.
+
+@table @code
+
+@findex DONE
+@item DONE
+Use the @code{DONE} macro to end RTL generation for the peephole.  The
+only RTL insns generated as replacement for the matched input insn will
+be those already emitted by explicit calls to @code{emit_insn} within
+the preparation statements; the replacement pattern is not used.
+
+@findex FAIL
+@item FAIL
+Make the @code{define_peephole2} fail on this occasion.  When a 
@code{define_peephole2}
+fails, it means that the replacement was not truly available for the
+particular inputs it was given.  In that case, GCC may still apply a
+later @code{define_peephole2} that also matches the given insn pattern.
+(Note that this is different from @code{define_split}, where @code{FAIL}
+prevents the input insn from being split at all.)
+@end table
+
+If the preparation falls through (invokes neither @code{DONE} nor
+@code{FAIL}), then the @code{define_peephole2} uses the replacement
+template.
+
 @noindent
 If we had not added the @code{(match_dup 4)} in the middle of the input
 sequence, it might have been the case that the register we chose at the



Re: [PATCH] doc clarification: DONE and FAIL in define_split and define_peephole2

2018-07-06 Thread Paul Koning
Thanks Richard.  Some comments, and an updated proposed patch.

paul


> On Jul 6, 2018, at 9:04 AM, Richard Sandiford  
> wrote:
> 
> ...
>> @@ -8232,6 +8256,15 @@ functionality as two separate @code{define_insn} a
>> patterns.  It exists for compactness, and as a maintenance tool to prevent
>> having to ensure the two patterns' templates match.
>> 
>> +In @code{define_insn_and_split}, the output template is usually simply
>> +@samp{#} since the assembly output is done by @code{define_insn}
>> +statements matching the generated insns, not by this
>> +@code{define_insn_and_split} statement.  But if @code{FAIL} is used in
>> +the preparation statements for certain input insns, those will not be
>> +split and during assembly output will again match this
>> +@code{define_insn_and_split}.  In that case, the appropriate assembly
>> +output statements are needed in the output template.
>> +
> 
> I agree "#" on its own is relatively common, but it's also not that
> unusual to have a define_insn_and_split in which the define_insn part
> handles simple alternatives directly and leaves more complex ones to
> be split.  Maybe that's more common on RISC-like targets.
> 
> Also, the define_split matches the template independently of the
> define_insn, so it can sometimes split insns that match an earlier
> define_insn rather than the one in the define_insn_and_split.
> (That might be bad practice.)  So using "#" and FAIL together is valid
> if the FAIL only happens for cases that match earlier define_insns.
> 
> Another case is when the define_split condition doesn't start with
> "&&" and is less strict than the define_insn condition.  This can
> be useful if the define_split is supposed to match patterns created
> by combine.
> 
> So maybe we should instead expand the FAIL documentation to say that
> a define_split must not FAIL when splitting an instruction whose
> output template is "#".

I played with this a bit and couldn't come up with a good wording.  The case is 
already covered; I was trying to make it clearer but your comments indicate the 
picture is bigger than I thought.

>> @@ -8615,6 +8648,31 @@ so here's a silly made-up example:
>>   "")
>> @end smallexample
>> 
>> +There are two special macros defined for use in the preparation statements:
>> +@code{DONE} and @code{FAIL}.  Use them with a following semicolon,
>> +as a statement.
>> +
>> +@table @code
>> +
>> +@findex DONE
>> +@item DONE
>> +Use the @code{DONE} macro to end RTL generation for the peephole.  The
>> +only RTL insns generated as replacement for the matched input insn will
>> +be those already emitted by explicit calls to @code{emit_insn} within
>> +the preparation statements; the replacement pattern is not used.
>> +
>> +@findex FAIL
>> +@item FAIL
>> +Make the @code{define_peephole2} fail on this occasion.  When a 
>> @code{define_peephole2}
>> +fails, it means that the replacement was not truly available for the
>> +particular inputs it was given, and the input insns are left unchanged.
> 
> If it FAILs, GCC will try to apply later define_peehole2s instead.
> (This is in contrast to define_split, so it's a bit inconsistent.
> Would be easy to make define_split behave the same way if there was a
> motivating case.)

Interesting.  I added words to the FAIL description of both define_split and 
define_peephole2 to state the behavior and highlight the difference.

ChangeLog:

2018-07-05  Paul Koning  

* doc/md.texi (define_split): Document DONE and FAIL.  Describe
interaction with usual "#" output template in
define_insn_and_split.
(define_peephole2): Document DONE and FAIL.

Index: doc/md.texi
===
--- doc/md.texi (revision 262455)
+++ doc/md.texi (working copy)
@@ -8060,6 +8060,30 @@ those in @code{define_expand}, however, these stat
 generate any new pseudo-registers.  Once reload has completed, they also
 must not allocate any space in the stack frame.
 
+There are two special macros defined for use in the preparation statements:
+@code{DONE} and @code{FAIL}.  Use them with a following semicolon,
+as a statement.
+
+@table @code
+
+@findex DONE
+@item DONE
+Use the @code{DONE} macro to end RTL generation for the splitter.  The
+only RTL insns generated as replacement for the matched input insn will
+be those already emitted by explicit calls to @code{emit_insn} within
+the preparation statements; the replacement pattern is not used.
+
+@findex FAIL
+@item FAIL
+Make the @code{define_split} fail on this occasion.  When a @code{define_split}
+fails, it means that the 

Re: Inefficient code

2018-07-06 Thread Paul Koning



> On Jul 6, 2018, at 6:18 AM, Bernd Edlinger  wrote:
> 
> You can get much better code if you make xrci a bit field.
> so the entire bit filed region can be accessed word-wise:
> 
> 
> #include 
> 
> struct Xrb
> {
>uint16_t xrlen; /* Length of I/O buffer in bytes */
>uint16_t xrbc;  /* Byte count for transfer */
>void * xrloc;   /* Pointer to I/O buffer */
>uint8_t xrci:8; /* Channel number times 2 for transfer */
>uint32_t xrblk:24;  /* Random access block number */
>uint16_t xrtime;/* Wait time for terminal input */
>uint16_t xrmod; /* Modifiers */
> };
> 
> void test(struct Xrb *XRB)
> {
>XRB->xrblk = 5;
> }
> 
> 
> Bernd.

That helps with x86.  It makes no difference with xstormy16, and it makes 
things slightly worse with pdp11 (though I can fiddle with the patterns to help 
with that; there's a zero_extend optimization I haven't coded yet).

On the other hand, since the two are equivalent it's reasonable to call this a 
missed optimization.

paul



Re: Inefficient code

2018-07-05 Thread Paul Koning



> On Jul 5, 2018, at 9:01 PM, Paul Koning  wrote:
> 
> 
> 
>> On Jul 5, 2018, at 6:47 PM, Eric Botcazou  wrote:
>> 
>>> So back to the previous one: anything I can do about a 24 bit field getting
>>> split into three movqi rather than a movqi plus a movhi?  That happens
>>> during RTL expand, I believe.
>> 
>> Yes, this one doesn't look as hopeless as the store merging issue.  A way of 
>> tackling it would be to do a side-by-side debugging of a compiler built for 
>> a 
>> similar target for which only 2 stores are generated.
> 
> I'll try xstormy16 since that's also 16 bit words, strict alignment.
> 
> Then again, I fed the code to GCC for VAX and it also produces a sequence of 
> 3 separate byte stores.  No mixed endians there.

Xstormy does 3 mov.b also.  For that matter, so does the x86 target (both -m32 
and -m64).  Hm.

paul




Re: Inefficient code

2018-07-05 Thread Paul Koning



> On Jul 5, 2018, at 6:47 PM, Eric Botcazou  wrote:
> 
>> So back to the previous one: anything I can do about a 24 bit field getting
>> split into three movqi rather than a movqi plus a movhi?  That happens
>> during RTL expand, I believe.
> 
> Yes, this one doesn't look as hopeless as the store merging issue.  A way of 
> tackling it would be to do a side-by-side debugging of a compiler built for a 
> similar target for which only 2 stores are generated.

I'll try xstormy16 since that's also 16 bit words, strict alignment.

Then again, I fed the code to GCC for VAX and it also produces a sequence of 3 
separate byte stores.  No mixed endians there.

paul



[PATCH] doc clarification: DONE and FAIL in define_split and define_peephole2

2018-07-05 Thread Paul Koning
Currently DONE and FAIL are documented only for define_expand, but they also 
work in essentially the same way for define_split and define_peephole2.

If FAIL is used in a define_insn_and_split, the output pattern cannot be the 
usual "#" dummy value. 

This patch updates the doc to describe those cases.  Ok for trunk?

paul

ChangeLog:

2018-07-05  Paul Koning  

* doc/md.texi (define_split): Document DONE and FAIL.  Describe
interaction with usual "#" output template in
define_insn_and_split.
(define_peephole2): Document DONE and FAIL.

Index: doc/md.texi
===
--- doc/md.texi (revision 262455)
+++ doc/md.texi (working copy)
@@ -8060,6 +8060,30 @@ those in @code{define_expand}, however, these stat
 generate any new pseudo-registers.  Once reload has completed, they also
 must not allocate any space in the stack frame.
 
+There are two special macros defined for use in the preparation statements:
+@code{DONE} and @code{FAIL}.  Use them with a following semicolon,
+as a statement.
+
+@table @code
+
+@findex DONE
+@item DONE
+Use the @code{DONE} macro to end RTL generation for the splitter.  The
+only RTL insns generated as replacement for the matched input insn will
+be those already emitted by explicit calls to @code{emit_insn} within
+the preparation statements; the replacement pattern is not used.
+
+@findex FAIL
+@item FAIL
+Make the @code{define_split} fail on this occasion.  When a @code{define_split}
+fails, it means that the splitter was not truly available for the inputs
+it was given, and this split is not done.
+@end table
+
+If the preparation falls through (invokes neither @code{DONE} nor
+@code{FAIL}), then the @code{define_split} uses the replacement
+template.
+
 Patterns are matched against @var{insn-pattern} in two different
 circumstances.  If an insn needs to be split for delay slot scheduling
 or insn scheduling, the insn is already known to be valid, which means
@@ -8232,6 +8256,15 @@ functionality as two separate @code{define_insn} a
 patterns.  It exists for compactness, and as a maintenance tool to prevent
 having to ensure the two patterns' templates match.
 
+In @code{define_insn_and_split}, the output template is usually simply
+@samp{#} since the assembly output is done by @code{define_insn}
+statements matching the generated insns, not by this
+@code{define_insn_and_split} statement.  But if @code{FAIL} is used in
+the preparation statements for certain input insns, those will not be
+split and during assembly output will again match this
+@code{define_insn_and_split}.  In that case, the appropriate assembly
+output statements are needed in the output template.
+
 @end ifset
 @ifset INTERNALS
 @node Including Patterns
@@ -8615,6 +8648,31 @@ so here's a silly made-up example:
   "")
 @end smallexample
 
+There are two special macros defined for use in the preparation statements:
+@code{DONE} and @code{FAIL}.  Use them with a following semicolon,
+as a statement.
+
+@table @code
+
+@findex DONE
+@item DONE
+Use the @code{DONE} macro to end RTL generation for the peephole.  The
+only RTL insns generated as replacement for the matched input insn will
+be those already emitted by explicit calls to @code{emit_insn} within
+the preparation statements; the replacement pattern is not used.
+
+@findex FAIL
+@item FAIL
+Make the @code{define_peephole2} fail on this occasion.  When a 
@code{define_peephole2}
+fails, it means that the replacement was not truly available for the
+particular inputs it was given, and the input insns are left unchanged.
+@end table
+
+If the preparation falls through (invokes neither @code{DONE} nor
+@code{FAIL}), then the @code{define_peephole2} uses the replacement
+template.
+
+
 @noindent
 If we had not added the @code{(match_dup 4)} in the middle of the input
 sequence, it might have been the case that the register we chose at the



Re: Inefficient code

2018-07-05 Thread Paul Koning



> On Jul 5, 2018, at 4:44 PM, Eric Botcazou  wrote:
> 
> ...
> The GIMPLE pass responsible for the optimization simply punts for the "funny-
> endian ordering" of the PDP11.  More generally, you shouldn't expect anything 
> sparkling for such a peculiar architecture as the PDP11.

Ok.  Yet another item for the machine specific optimization pass (to be 
written).

So back to the previous one: anything I can do about a 24 bit field getting 
split into three movqi rather than a movqi plus a movhi?  That happens during 
RTL expand, I believe.

paul



Re: decrement_and_branch_until_zero pattern

2018-07-05 Thread Paul Koning



> On Jun 8, 2018, at 6:16 PM, Jim Wilson  wrote:
> 
> On Fri, Jun 8, 2018 at 1:12 PM, Paul Koning  wrote:
>> Thanks.  I saw those sections and interpreted them as support for signal 
>> processor style fast hardware loops.  If they can be adapted for dbra type 
>> looping, great.  I'll give that a try.
> 
> The rs6000 port uses it for bdnz (branch decrement not zero) for
> instance, which is similar to the m68k dbra.
> 
>> Meanwhile, yes, it looks like there is a documentation bug.  I can clean 
>> that up.  It's more than a few lines, but does that qualify for an "obvious" 
>> change?
> 
> I think the obvious rule should only apply to trivial patches, and
> this will require some non-trivial changes to fix the looping pattern
> section.  Just deleting the decrement_and_branch_until_zero named
> pattern section looks trivial.  It looks like the REG_NONNEG section
> should  mention the doloop_end pattern instead of
> decrement_and_branch_until_zero, since I think the same rule applies
> that they only get generated if the doloop_end pattern exists.

It appears that doloop_end doesn't get a REG_NONNEG pattern.  I just added 
doloop_end to my target and I don't see any such note in the RTL dumps.

By the way, the documentation makes it clear that the register used in the 
doloop isn't available as an "induction variable".  I wonder if there is a way 
to make that restriction go away, since my target -- and I suspect several 
others as well -- use a general register for the loop control.  So as far as 
the target is concerned the loop control register is definitely available.  I 
understand the original restriction for machines like DSPs that loop using 
special-purpose loop registers, but that isn't a universal limitation.

paul




Re: Inefficient code

2018-07-05 Thread Paul Koning



> On Jul 5, 2018, at 12:01 PM, Segher Boessenkool  
> wrote:
> 
> On Thu, Jul 05, 2018 at 08:45:30AM -0400, Paul Koning wrote:
>> I have a struct that looks like this:
>> 
>> struct Xrb
>> {
>>uint16_t xrlen;   /* Length of I/O buffer in bytes */
>>uint16_t xrbc;/* Byte count for transfer */
>>void * xrloc; /* Pointer to I/O buffer */
>>uint8_t xrci; /* Channel number times 2 for transfer */
>>uint32_t xrblk:24;/* Random access block number */
>>uint16_t xrtime;  /* Wait time for terminal input */
>>uint16_t xrmod;   /* Modifiers */
>> };
>> 
>> When I write to xrblk (that 24 bit field) on my 16 bit target, I get 
>> unexpectly inefficient output:
>> 
>>XRB->xrblk = 5;
>> 
>>  movb#5,10(r0)
>>  clrb11(r0)
>>  clrb7(r0)
> 
> (7? not 12?)

Octal offsets.  It's writing the 3 bytes in LSB to MSB order.  (PDP11 -- which 
has funny-endian ordering.)

> rather than the expected word write to the word-aligned lower half of that 
> field.
>> 
>> Looking at the dumps, I see it coming into the RTL expand phase as a single 
>> write, which expand then turns into the three insns corresponding to the 
>> above.  But (of course) there is a word (HImode) move also, which has the 
>> same cost as the byte one.
>> 
>> Is there something I have to do in my target definition to get this to come 
>> out right?  This is a strict_alignment target, but alignment is satisfied in 
>> this example.  Also, SLOW_BYTE_ACCESS is 1.
> 
> What is your MOVE_MAX?  It should be 2 probably.

It is. 

I just constructed another test case that shows the same issue more blatantly:

struct s
{
char a;
char b;
char c;
char d;
int e;
int f;
char h;
char i;
};

struct s ts;

void setts(void)
{
ts.a=2;
ts.b=4;
ts.c=1;
ts.d=24;
ts.e=5;
ts.f=42;
ts.h=9;
ts.i=3;
}

Each of the fields are written separately, even though clearly the adjacent 
byte writes can and should be combined into a single HImode move.  This happens 
both with -O2 and -Os.

paul



Re: ChangeLog's: do we have to?

2018-07-05 Thread Paul Koning



> On Jul 5, 2018, at 10:44 AM, Florian Weimer  wrote:
> 
> * Richard Kenner:
> 
>>> GCC ChangeLogs don't record the purpose of the change. They say what 
>>> changed,
>>> but not why.
>> 
>> That depends on how you define "purpose".  Let's take a random entry,
>> from a 1999 change of mine:
>> 
>>* expr.c (expand_expr): If ignoring reference operations,
>>just expand the operands.
>>(expand_expr, case COMPONENT_REF): Refine test for when need
>>to use bitfield operations or pointer punning.
>> 
>> This indeed doesn't say why the change was made (though later policy
>> did tend to at least suggest adding PR numbers), but does say, in some
>> detail, what was changed.  You can't get that informtion from the current
>> state of the code.  You could indeed derive it from the diff itself, but
>> that's work and having a short summary is very helpful.
> 
> But this is against GNU policy for ChangeLog entries.  Explanations of
> the change should go into the source code, as comments.
> 
> (Of course, this policy is problematic if you *remove* code.)

More than that: comments in code normally explain what the code does today.  
They do not explain (and should not) how the current code differs from what was 
there before.   So they give no insight in the reason for the change, except 
only very indirectly and then only some of the time.

paul




Re: ChangeLog's: do we have to?

2018-07-05 Thread Paul Koning



> On Jul 5, 2018, at 10:43 AM, Florian Weimer  wrote:
> 
> * Aldy Hernandez:
> 
>> Can someone refresh my memory here, what are the remaining arguments for 
>> requiring ChangeLog entries?
> 
> ChangeLog entries are part of the review, commit messages are not, so
> you end up with surprises there.  At least that's what happens in
> glibc.

That issue could be fixed easily by having the proposed commit message 
reviewed.  That's approximately what we have today since the commit message is 
basically the change log entry, with perhaps another sentence or two.

paul




Inefficient code

2018-07-05 Thread Paul Koning
I have a struct that looks like this:

struct Xrb
{
uint16_t xrlen; /* Length of I/O buffer in bytes */
uint16_t xrbc;  /* Byte count for transfer */
void * xrloc;   /* Pointer to I/O buffer */
uint8_t xrci;   /* Channel number times 2 for transfer */
uint32_t xrblk:24;  /* Random access block number */
uint16_t xrtime;/* Wait time for terminal input */
uint16_t xrmod; /* Modifiers */
};

When I write to xrblk (that 24 bit field) on my 16 bit target, I get unexpectly 
inefficient output:

XRB->xrblk = 5;

movb#5,10(r0)
clrb11(r0)
clrb7(r0)

rather than the expected word write to the word-aligned lower half of that 
field.

Looking at the dumps, I see it coming into the RTL expand phase as a single 
write, which expand then turns into the three insns corresponding to the above. 
 But (of course) there is a word (HImode) move also, which has the same cost as 
the byte one.

Is there something I have to do in my target definition to get this to come out 
right?  This is a strict_alignment target, but alignment is satisfied in this 
example.  Also, SLOW_BYTE_ACCESS is 1.

paul




[PATCH, committed] Add -mgnu-asm to pdp11 target, change -mdec-asm

2018-07-01 Thread Paul Koning
The pdp11 target has long had -mdec-asm which was documented to generate DEC 
compatible assembly language output but actually produces GNU assembler output. 
 This patch adds -mgnu-asm to do what -mdec-asm used to do, and -mdec-asm now 
does produces output acceptable to DEC Macro-11.

Committed.

paul

ChangeLog:

2018-07-01  Paul Koning  

* common/config/pdp11/pdp11-common.c (pdp11_handle_option): Handle
-munit-asm, -mgnu-asm, -mdec-asm.
* config/pdp11/pdp11-protos.h (pdp11_gen_int_label): New.
(pdp11_output_labelref): New.
(pdp11_output_def): New.
(pdp11_output_addr_vec_elt): New.
* config/pdp11/pdp11.c: Use tab between opcode and operands.  Use
%# and %@ format codes.
(pdp11_option_override): New.
(TARGET_ASM_FILE_START_FILE_DIRECTIVE): Define.
(pdp11_output_ident): New.
(pdp11_asm_named_section): New.
(pdp11_asm_init_sections): New.
(pdp11_file_start): New.
(pdp11_file_end): New.
(output_ascii): Use .ascii/.asciz for -mdec-asm.
(pdp11_asm_print_operand): Update %# and %$ for -mdec-asm.  Add
%o, like %c but octal.
(pdp11_option_override): New.
* config/pdp11/pdp11.h (TEXT_SECTION_ASM_OP): Update for
-mdec-asm.
(DATA_SECTION_ASM_OP): Ditto.
(READONLY_DATA_SECTION_ASM_OP): New.
(IS_ASM_LOGICAL_LINE_SEPARATOR): New.
(ASM_GENERATE_INTERNAL_LABEL): Use new function.
(ASM_OUTPUT_LABELREF): Ditto.
(ASM_OUTPUT_DEF): Ditto.
(ASM_OUTPUT_EXTERNAL): New.
(ASM_OUTPUT_SOURCE_FILENAME): New.
(ASM_OUTPUT_ADDR_VEC_ELT): Use new function.
(ASM_OUTPUT_SKIP): Update for -mdec-asm.
* config/pdp11/pdp11.md: Use tab between opcode and operands.  Use
%# and %@ format codes.
* config/pdp11/pdp11.opt (mgnu-asm): New.
(mdec-asm): Conflicts with -mgnu-asm and -munix-asm.
(munix-asm): Conflicts with -mdec-asm and -mgnu-asm.
* doc/invoke.txt (PDP-11 Options): Add -mgnu-asm.

Index: common/config/pdp11/pdp11-common.c
===
--- common/config/pdp11/pdp11-common.c  (revision 262287)
+++ common/config/pdp11/pdp11-common.c  (working copy)
@@ -59,7 +59,16 @@ pdp11_handle_option (struct gcc_options *opts,
   opts->x_target_flags &= ~MASK_40;
   opts->x_target_flags |= MASK_45;
   return true;
-  
+
+case OPT_munix_asm:
+case OPT_mgnu_asm:
+  targetm_common.have_named_sections = false;
+  return true;
+
+case OPT_mdec_asm:
+  targetm_common.have_named_sections = true;
+  return true;
+
 default:
   return true;
 }
Index: config/pdp11/pdp11-protos.h
===
--- config/pdp11/pdp11-protos.h (revision 262287)
+++ config/pdp11/pdp11-protos.h (working copy)
@@ -51,3 +51,7 @@ extern void pdp11_asm_output_var (FILE *, const ch
 extern void pdp11_expand_prologue (void);
 extern void pdp11_expand_epilogue (void);
 extern poly_int64 pdp11_push_rounding (poly_int64);
+extern void pdp11_gen_int_label (char *, const char *, int);
+extern void pdp11_output_labelref (FILE *, const char *);
+extern void pdp11_output_def (FILE *, const char *, const char *);
+extern void pdp11_output_addr_vec_elt (FILE *, int);
Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 262287)
+++ config/pdp11/pdp11.c(working copy)
@@ -212,7 +212,7 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 #undef  TARGET_PREFERRED_OUTPUT_RELOAD_CLASS
 #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS 
pdp11_preferred_output_reload_class
 
-#undef TARGET_LRA_P
+#undef  TARGET_LRA_P
 #define TARGET_LRA_P hook_bool_void_false
 
 #undef  TARGET_LEGITIMATE_ADDRESS_P
@@ -221,9 +221,30 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 #undef  TARGET_CONDITIONAL_REGISTER_USAGE
 #define TARGET_CONDITIONAL_REGISTER_USAGE pdp11_conditional_register_usage
 
+#undef  TARGET_OPTION_OVERRIDE
+#define TARGET_OPTION_OVERRIDE pdp11_option_override
+
+#undef  TARGET_ASM_FILE_START_FILE_DIRECTIVE
+#define TARGET_ASM_FILE_START_FILE_DIRECTIVE true
+
+#undef  TARGET_ASM_OUTPUT_IDENT
+#define TARGET_ASM_OUTPUT_IDENT pdp11_output_ident
+
 #undef  TARGET_ASM_FUNCTION_SECTION
 #define TARGET_ASM_FUNCTION_SECTION pdp11_function_section
 
+#undef  TARGET_ASM_NAMED_SECTION
+#defineTARGET_ASM_NAMED_SECTION pdp11_asm_named_section
+
+#undef  TARGET_ASM_INIT_SECTIONS
+#define TARGET_ASM_INIT_SECTIONS pdp11_asm_init_sections
+
+#undef  TARGET_ASM_FILE_START
+#define TARGET_ASM_FILE_START pdp11_file_start
+
+#undef  TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END pdp11_file_end
+
 #undef  TARGET_PRINT_OPERAND
 #define TARGET_PRINT_OPERAND pdp11_asm_print_operand
 
@@ -238,6 +259,7 @@ static bool pdp11_scalar_mode_supported_

[PATCH, committed] Fix insn length in pdp11 shift patterns

2018-06-28 Thread Paul Koning
This patch corrects the setting of the "length" attributes in the pdp11 target 
for certain shift cases.

Committed.

paul

ChangeLog:

2018-06-28  Paul Koning  

* config/pdp11/pdp11-protos.h (pdp11_shift_length): New function.
* config/pdp11/pdp11.c (pdp11_shift_length): New function.
* config/pdp11/pdp11.h (ADJUST_INSN_LENGTH): Remove.
* config/pdp11/pdp11.md: Correct "length" attribute calculation
for shift insn patterns.

Index: config/pdp11/pdp11-protos.h
===
--- config/pdp11/pdp11-protos.h (revision 262198)
+++ config/pdp11/pdp11-protos.h (working copy)
@@ -41,6 +41,7 @@ extern machine_mode pdp11_cc_mode (enum rtx_code,
 extern bool pdp11_expand_shift (rtx *, rtx (*) (rtx, rtx, rtx),
rtx (*) (rtx, rtx, rtx));
 extern const char * pdp11_assemble_shift (rtx *, machine_mode, int);
+extern int pdp11_shift_length (rtx *, machine_mode, int, bool);
 extern bool pdp11_small_shift (int);
 
 #endif /* RTX_CODE */
Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 262198)
+++ config/pdp11/pdp11.c(working copy)
@@ -2020,6 +2020,35 @@ pdp11_assemble_shift (rtx *operands, machine_mode
   return "";
 }
 
+/* Figure out the length of the instructions that will be produced for
+   the given operands by pdp11_assemble_shift above.  */
+int
+pdp11_shift_length (rtx *operands, machine_mode m, int code, bool 
simple_operand_p)
+{
+  int shift_size;
+
+  /* Shift by 1 is 2 bytes if simple operand, 4 bytes if 2-word addressing 
mode.  */
+  shift_size = simple_operand_p ? 2 : 4;
+
+  /* In SImode, two shifts are needed per data item.  */
+  if (m == E_SImode)
+shift_size *= 2;
+
+  /* If shifting by a small constant, the loop is unrolled by the
+ shift count.  Otherwise, account for the size of the decrement
+ and branch.  */
+  if (CONSTANT_P (operands[2]) && pdp11_small_shift (INTVAL (operands[2])))
+shift_size *= INTVAL (operands[2]);
+  else
+shift_size += 4;
+
+  /* Logical right shift takes one more instruction (CLC).  */
+  if (code == LSHIFTRT)
+shift_size += 2;
+
+  return shift_size;
+}
+
 /* Worker function for TARGET_TRAMPOLINE_INIT.
 
trampoline - how should i do it in separate i+d ? 
Index: config/pdp11/pdp11.h
===
--- config/pdp11/pdp11.h(revision 262198)
+++ config/pdp11/pdp11.h(working copy)
@@ -123,22 +123,6 @@ extern const struct real_format pdp11_d_format;
 /* Define this if move instructions will actually fail to work
when given unaligned data.  */
 #define STRICT_ALIGNMENT 1
-
-/* Adjust the length of shifts by small constant amounts.  The base
-   value (in "length" on input) is the length of a shift by one, not
-   including the CLC in logical shifts.  */
-#define ADJUST_INSN_LENGTH(insn, length) \
-  if ((GET_CODE (insn) == ASHIFT || \
-   GET_CODE (insn) == ASHIFTRT || \
-   GET_CODE (insn) == LSHIFTRT) && \
-  GET_CODE (XEXP (insn, 2)) == CONST_INT && \
-  pdp11_small_shift (XINT (insn, 2))) \
-{\
-  if (GET_CODE (insn) == LSHIFTRT)   \
-   length = (length * XINT (insn, 2)) + 2; \
-  else \
-   length *= XINT (insn, 2); \
-}
 
 /* Standard register usage.  */
 
Index: config/pdp11/pdp11.md
===
--- config/pdp11/pdp11.md   (revision 262198)
+++ config/pdp11/pdp11.md   (working copy)
@@ -1297,11 +1297,6 @@
 ;; used to reduce the amount of very similar code.
 ;;
 ;; First the insns used for small constant shifts.
-;
-;; The "length" attribute values are modified by the ADJUST_INSN_LENGTH
-;; macro for the small constant shift case (first two alternatives).
-;; For those, the value coded in the length attribute is the cost of just
-;; the shift for a single shift.
 (define_insn "_sc"
   [(set (match_operand:QHSint 0 "nonimmediate_operand" "=rD,Q")
(SHF:QHSint (match_operand:QHSint 1 "general_operand" "0,0")
@@ -1308,7 +1303,9 @@
(match_operand:HI 2 "expand_shift_operand" "O,O")))]
   ""
   "* return pdp11_assemble_shift (operands, , );"
-  [(set_attr "length" "2,4")])
+  [(set (attr "length")
+   (symbol_ref "pdp11_shift_length (operands, , 
+ , which_alternative == 0)"))])
 
 ;; Next, shifts that are done as a loop on base (11/10 class) machines.
 ;; This applies to shift counts too large to unroll, or variable shift
@@ -1320,7 +1317,9 @@
(clobber (match_dup 2))]
   ""
   "* return pdp11_assemble_shift (operands, 

[PATCH, committed] Convert pdp11 back end to CCmode

2018-06-27 Thread Paul Koning
This change converts the pdp11 back end to use CCmode rather than cc0.  It is 
functional and the testsuite compile sections runs at least as well as before.

There is additional work left to be done; for example, the compare elimination 
pass, which this target uses, does not know how to deal with a machine that has 
two condition code registers.

Committed.

paul

ChangeLog:

2018-06-27  Paul Koning  

* common/config/pdp11/pdp11-common.c (pdp11_handle_option): Handle
mutually exclusive options.
* config/pdp11/constraints.md (h): New constraint.
(O): Update definition to match shift code generation.
(D): New constraint.
* config/pdp11/pdp11-modes.def (CCNZ): Define mode.
(CCFP): Remove.
* config/pdp11/pdp11-protos.h (int_no_side_effect_operand): New
function.
(output_jump): Change arguments.
(pdp11_fixed_cc_regs): New function.
(pdp11_cc_mode): Ditto.
(pdp11_expand_shift): Ditto.
(pdp11_assemble_shift): Ditto.
(pdp11_small_shift): Ditto.
(pdp11_branch_cost): Remove.
* config/pdp11/pdp11.c (pdp11_assemble_integer): Remove comments
from output.
(pdp11_register_move_cost): Update for CC registers.
(pdp11_rtx_costs): Add case for LSHIFTRT.
(pdp11_output_jump): Add CCNZ mode conditional branches.
(notice_update_cc_on_set): Remove.
(pdp11_cc_mode): New function.
(simple_memory_operand): Correct pre/post decrement case.
(no_side_effect_operand): New function.
(pdp11_regno_reg_class): Add CC_REGS class.
(pdp11_fixed_cc_regs): New function.
(pdp11_small_shift): New function.
(pdp11_expand_shift): New function to expand shift insns.
(pdp11_assemble_shift): New function to output shifts.
(pdp11_branch_cost): Remove.
(pdp11_modes_tieable_p): Make QI/HI modes tieable.
* config/pdp11/pdp11.h (SIZE_TYPE): Ensure 16-bit type.
(WCHAR_TYPE): Ditto.
(PTRDIFF_TYPE): Ditto.
(ADJUST_INSN_LENGTH): New macro.
(FIXED_REGISTERS): Add CC registers.
(CALL_USED_REGISTERS): Ditto.
(reg_class): Ditto.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.
(SELECT_CC_MODE): Use new function.
(TARGET_FLAGS_REGNUM): New macro.
(TARGET_FIXED_CONDITION_CODE_REGS): Ditto.
(cc0_reg_rtx): Remove.
(CC_STATUS_MDEP): Remove.
(CC_STATUS_MDEFP_INIT): Remove.
(CC_IN_FPU): Remove.
(NOTICE_UPDATE_CC): Remove.
(REGISTER_NAMES): Add CC registers.
(BRANCH_COST): Change to constant 1.
* config/pdp11/pdp11.md: Rewrite for CCmode condition code
handling.
* config/pdp11/pdp11.opt (mbcopy): Remove.
(mbcopy-builtin): Remove.
(mbranch-cheap): Remove.
(mbranch-expensive): Remove.
* config/pdp11/predicates.md (expand_shift_operand): Update to
match shift code generation.
(ccnz_operator): New predicate.
* doc/invoke.texi (PDP-11 Options): Remove deleted options
-mbcopy, -mbcopy-builtin, -mbranch-cheap, -mbranch-expensive.
Remove non-existent option -mabshi, -mno-abshi.  Document mutually
exclusive options.
* doc/md.texi (PDP-11): Document new D and h constraints.  Update
description of O constraint.

Index: common/config/pdp11/pdp11-common.c
===
--- common/config/pdp11/pdp11-common.c  (revision 262195)
+++ common/config/pdp11/pdp11-common.c  (working copy)
@@ -39,9 +39,27 @@ pdp11_handle_option (struct gcc_options *opts,
   switch (code)
 {
 case OPT_m10:
-  opts->x_target_flags &= ~(MASK_40 | MASK_45);
+  opts->x_target_flags &= ~(MASK_40 | MASK_45 | MASK_FPU | MASK_AC0 | 
MASK_SPLIT);
   return true;
 
+case OPT_m40:
+  opts->x_target_flags &= ~(MASK_45 | MASK_FPU | MASK_AC0 | MASK_SPLIT);
+  return true;
+
+case OPT_mfpu:
+  opts->x_target_flags &= ~MASK_40;
+  opts->x_target_flags |= MASK_45;
+  return true;
+  
+case OPT_msoft_float:
+  opts->x_target_flags &= ~MASK_AC0;
+  return true;
+
+case OPT_msplit:
+  opts->x_target_flags &= ~MASK_40;
+  opts->x_target_flags |= MASK_45;
+  return true;
+  
 default:
   return true;
 }
Index: config/pdp11/constraints.md
===
--- config/pdp11/constraints.md (revision 262195)
+++ config/pdp11/constraints.md (working copy)
@@ -18,11 +18,14 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
+(define_register_constraint "a" "LOAD_FPU_REGS"
+  "FPU register that can be directly loaded from memory")
+
 (define_register_constraint "f" "F

[PATCH, committed] fix pdp11 target test failures

2018-06-22 Thread Paul Koning
This patch fixes a number of test suite failures caused by data too large for 
the address space or alignment larger than supported by this target.

paul

testsuite/ChangeLog:

2018-06-22  Paul Koning  

* gcc.c-torture/execute/builtins/lib/chk.c: Use smaller alignment
if pdp11.
* gcc.c-torture/compile/20010518-2.c: Skip if pdp11 -mint32.
* gcc.c-torture/compile/20040101-1.c: Ditto.
* gcc.c-torture/compile/20050622-1.c: Ditto.
* gcc.c-torture/compile/20080625-1.c: Ditto.
* gcc.c-torture/compile/20090107-1.c: Ditto.
* gcc.c-torture/compile/920501-12.c: Ditto.
* gcc.c-torture/compile/920501-4.c: Ditto.
* gcc.c-torture/compile/961203-1.c: Ditto.
* gcc.c-torture/compile/limits-externdecl.c: Ditto.
* gcc.c-torture/compile/pr25310.c: Ditto.

Index: testsuite/gcc.c-torture/execute/builtins/lib/chk.c
===
--- testsuite/gcc.c-torture/execute/builtins/lib/chk.c  (revision 261896)
+++ testsuite/gcc.c-torture/execute/builtins/lib/chk.c  (revision 261897)
@@ -3,10 +3,18 @@
 #include 
 #endif
 
+/* If some target has a Max alignment less than 16, please create
+   a #ifdef around the alignment and add your alignment.  */
+#ifdef __pdp11__
+#define ALIGNMENT 2
+#else
+#define ALIGNMENT 16
+#endif
+
 extern void abort (void);
 
 extern int inside_main;
-void *chk_fail_buf[256] __attribute__((aligned (16)));
+void *chk_fail_buf[256] __attribute__((aligned (ALIGNMENT)));
 volatile int chk_fail_allowed, chk_calls;
 volatile int memcpy_disallowed, mempcpy_disallowed, memmove_disallowed;
 volatile int memset_disallowed, strcpy_disallowed, stpcpy_disallowed;
Index: testsuite/gcc.c-torture/compile/920501-4.c
===
--- testsuite/gcc.c-torture/compile/920501-4.c  (revision 261896)
+++ testsuite/gcc.c-torture/compile/920501-4.c  (revision 261897)
@@ -1,5 +1,6 @@
 /* { dg-do assemble } */
 /* { dg-skip-if "ptxas times out" { nvptx-*-* } { "-O1" } { "" } } */
+/* { dg-skip-if "Array too big" { "pdp11-*-*" } { "-mint32" } } */
 
 foo ()
 {
Index: testsuite/gcc.c-torture/compile/20080625-1.c
===
--- testsuite/gcc.c-torture/compile/20080625-1.c(revision 261896)
+++ testsuite/gcc.c-torture/compile/20080625-1.c(revision 261897)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target int32plus } */
+/* { dg-skip-if "Array too big" { "pdp11-*-*" } { "-mint32" } } */
 
 struct peakbufStruct {
 unsigned int lnum [5000];
Index: testsuite/gcc.c-torture/compile/20040101-1.c
===
--- testsuite/gcc.c-torture/compile/20040101-1.c(revision 261896)
+++ testsuite/gcc.c-torture/compile/20040101-1.c(revision 261897)
@@ -1,4 +1,4 @@
-/* { dg-skip-if "not enough registers" { pdp11-*-* } { "-O[12s]" } { "" } } */
+/* { dg-skip-if "not enough registers" { pdp11-*-* } } */
 
 typedef unsigned short uint16_t;
 typedef unsigned int uint32_t;
Index: testsuite/gcc.c-torture/compile/20090107-1.c
===
--- testsuite/gcc.c-torture/compile/20090107-1.c(revision 261896)
+++ testsuite/gcc.c-torture/compile/20090107-1.c(revision 261897)
@@ -1,3 +1,5 @@
+/* { dg-skip-if "Array too big" { "pdp11-*-*" } { "-mint32" } } */
+
 /* Verify that we don't ICE by forming invalid addresses for unaligned
doubleword loads (originally for PPC64).  */
 
Index: testsuite/gcc.c-torture/compile/20010518-2.c
===
--- testsuite/gcc.c-torture/compile/20010518-2.c(revision 261896)
+++ testsuite/gcc.c-torture/compile/20010518-2.c(revision 261897)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "Array too big" { "pdp11-*-*" } { "-mint32" } } */
 
 /* Large static storage.  */
 
Index: testsuite/gcc.c-torture/compile/920501-12.c
===
--- testsuite/gcc.c-torture/compile/920501-12.c (revision 261896)
+++ testsuite/gcc.c-torture/compile/920501-12.c (revision 261897)
@@ -1,4 +1,5 @@
 /* { dg-do assemble } */
+/* { dg-skip-if "Array too big" { "pdp11-*-*" } { "-mint32" } } */
 
 x(x){return 3 + x;}
 a(x){int y[994]; return 3 + x;}
Index: testsuite/gcc.c-torture/compile/limits-externdecl.c
===
--- testsuite/gcc.c-torture/compile/limits-externdecl.c (revision 261896)
+++ testsuite/gcc.c-torture/compile/limits-externdecl.c (revision 261897)
@@ -1,4 +1,5 @@
 /* { dg-skip-if "ptxas runs out of me

ICE in a testcase, not sure about solution

2018-06-20 Thread Paul Koning
I'm running into an ICE in the GIMPLE phase, for gcc.c-torture/compile/386.c, 
on pdp11 -mint32.  That's an oddball where int is 32 bits (due to the flag) but 
Pmode is 16 bits (HImode).

The ICE message is:

../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c: In function ‘main’:
../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c:24:1: error: invalid types 
in nop conversion
 }
 ^
int
int *
b_3 = (int) 
during GIMPLE pass: einline
../../gcc/gcc/testsuite/gcc.c-torture/compile/386.c:24:1: internal compiler 
error: verify_gimple failed

The offending code snippet is (I think):

main ()
{
  int i;
  foobar (i, );
}


foobar (a, b)
{
  int c;

  c = a % b;
  a = a / b;
  return a + b;
}

where the foobar(i, ) call passes an int* to a (defaulted) int function 
parameter.  Is there an assumption that sizeof (int*) >= sizeof(int)?

Any idea where to look?  It only shows up with -mint32; if int is 16 bits all 
is well.  I'm not used to my target breaking things before I even get to RTL...

paul



Re: divmod pattern question

2018-06-20 Thread Paul Koning



> On Jun 20, 2018, at 1:16 AM, Jeff Law  wrote:
> 
> On 06/19/2018 12:55 PM, Paul Koning wrote:
>> Gentlepeople,
>> 
>> I have a two-operand divide instruction that takes a double length dividend 
>> in a register pair, and produces the quotient in the first register and 
>> remainder in the second.
>> 
>> How do I write a divmod pattern for that?  The quotient is easy enough, I 
>> write a match_operand for that register and a matching constraint ("0") for 
>> the input dividend.  But what about the remainder?  The remainder appears in 
>> a register that isn't explicitly mentioned in the RTL (it's the regnum one 
>> higher than the quotient, or if you like, the second subreg of the input 
>> (dividend) register.
> You can generally allocate double-sized registers with appropriate
> constraints and the like.  And you could use matching constraints,
> perhaps with subregs, but in the end, ew.
> 
>> 
>> I can make it a define_expand that adds a move from the remainder register 
>> into a new register which is the output operand, and count on the optimizer 
>> to optimize away that move.  Is that the best answer?  The current "mod" 
>> pattern does that, and I could keep that approach.
> But this would generally be better I think.  I'd expect the move to be
> optimized away the vast majority of the time.

Thanks.  I looked at some others, like M68k, the difference there is that the 
mod result goes to an explicitly named register in the machine instruction.

Here's what I ended up with; it seems to work even though it doesn't match 
precisely what the documentation seems to call for.

(define_expand "divmodhi4"
  [(parallel
[(set (subreg:HI (match_dup 1) 0)
(div:HI (match_operand:SI 1 "register_operand" "0")
(match_operand:HI 2 "general_operand" "g")))
 (set (subreg:HI (match_dup 1) 2)
(mod:HI (match_dup 1) (match_dup 2)))])
   (set (match_operand:HI 0 "register_operand" "=r")
(subreg:HI (match_dup 1) 0))
   (set (match_operand:HI 3 "register_operand" "=r")
(subreg:HI (match_dup 1) 2))]
  "TARGET_40_PLUS"
  "")

; and then the actual final instruction:
(define_insn "divmodhi4_nocc"
  [(set (subreg:HI (match_operand:SI 0 "register_operand" "=r,r") 0)
(div:HI (match_operand:SI 1 "register_operand" "0,0")
 (match_operand:HI 2 "general_operand" "rR,Qi")))
   (set (subreg:HI (match_dup 1) 2)
(mod:HI (match_dup 1) (match_dup 2)))
   (clobber (reg:CC CC_REGNUM))]
  "TARGET_40_PLUS"
   "div %2,%0"
  [(set_attr "length" "2,4")])

paul



divmod pattern question

2018-06-19 Thread Paul Koning
Gentlepeople,

I have a two-operand divide instruction that takes a double length dividend in 
a register pair, and produces the quotient in the first register and remainder 
in the second.

How do I write a divmod pattern for that?  The quotient is easy enough, I write 
a match_operand for that register and a matching constraint ("0") for the input 
dividend.  But what about the remainder?  The remainder appears in a register 
that isn't explicitly mentioned in the RTL (it's the regnum one higher than the 
quotient, or if you like, the second subreg of the input (dividend) register.

I can make it a define_expand that adds a move from the remainder register into 
a new register which is the output operand, and count on the optimizer to 
optimize away that move.  Is that the best answer?  The current "mod" pattern 
does that, and I could keep that approach.

paul



Re: [PATCH] Come up with Deprecated option flag.

2018-06-11 Thread Paul Koning



> On Jun 8, 2018, at 7:09 AM, Martin Liška  wrote:
> 
> Hi.
> 
> First follow-up MPX removal patch comes up with Deprecated option flag.
> That prints warning for options that have no effect:

Should this be mentioned in the internals manual (section 8.2)?

paul



Re: in/out operands and auto-inc-dec

2018-06-11 Thread Paul Koning



> On Jun 11, 2018, at 3:04 PM, Jeff Law  wrote:
> 
> On 06/04/2018 09:02 AM, Paul Koning wrote:
>> 
>> ...
>> 
>> By "multiple memory operands" do you mean both source and dest in
>> memory?  
> Yes and no :-) I suspect no real thought was given to what happens when
> there's more than one auto-inc in the pattern, regardless of what
> happens in the final instruction.
> 
> I realize that in your case the operand appears twice in the RTL, but
> just once in the final output.  One might argue that if the operands in
> the pattern are tied together via a match_dup that we ought to be able
> to support it.  But I doubt anyone has thought much about it.

The description of matching constraint makes it clear that it's meant to be 
different from match_dup.  The way it's described is (paraphrasing) that it 
acts like match_dup except if the operand it refers to has a side effect in it. 
 In that case, it matches an operand RTX with the side effect removed.  The 
example given is: if the referenced operand is *p++, the matching constraint 
looks for *p.

And that indeed would be right.  If I have a two-operand add instruction, 
(set (mem (post_inc (reg))) (add (mem (reg)) operand2)))

would make sense.  The documentation makes me expect such a thing to show up 
and be recognized by a matching constraint.  But the auto-inc code doesn't do 
this, and what's more interesting, it seems that the old code (in flow.c) 
didn't either.

paul



Re: decrement_and_branch_until_zero pattern

2018-06-08 Thread Paul Koning



> On Jun 8, 2018, at 2:29 PM, Jim Wilson  wrote:
> 
> On 06/08/2018 06:21 AM, Paul Koning wrote:
>> Interesting.  The ChangeLog doesn't give any background.  I suppose I should 
>> plan to approximate the effect of this pattern with a define-peephole2 ?
> 
> The old RTL loop optimizer was replaced with a new RTL loop optimizer. When 
> the old one was written, m68k was a major target, and the dbra optimization 
> was written for it.  When the new one was written, m68k was not a major 
> target, and this support was written differently.  We now have doloop_begin 
> and doloop_end patterns that do almost the same thing, and can be created by 
> the loop-doloop.c code.
> 
> There is a section in the internals docs that talks about this.
> https://gcc.gnu.org/onlinedocs/gccint/Looping-Patterns.html
> 
> The fact that we still have decrement_and_branch_until_zero references in 
> docs and target md files looks like a bug.  The target md files should use 
> doloop patterns instead, and the doc references should be dropped.

Thanks.  I saw those sections and interpreted them as support for signal 
processor style fast hardware loops.  If they can be adapted for dbra type 
looping, great.  I'll give that a try.

Meanwhile, yes, it looks like there is a documentation bug.  I can clean that 
up.  It's more than a few lines, but does that qualify for an "obvious" change?

paul



Re: decrement_and_branch_until_zero pattern

2018-06-08 Thread Paul Koning



> On Jun 8, 2018, at 6:59 AM, Andreas Schwab  wrote:
> 
> On Jun 07 2018, Paul Koning  wrote:
> 
>> None of these seem to use that loop optimization, with -O2 or -Os.  Did I
>> miss some magic switch to turn on something else that isn't on by default?
>> Or is this a feature that was broken long ago and not noticed?  If so, any
>> hints where I might look for a reason?
> 
> commit 7d3c6452d7
> Author: rakdver 
> Date:   Thu Mar 2 23:50:55 2006 +
> 
>* loop.c: Removed.

Interesting.  The ChangeLog doesn't give any background.  I suppose I should 
plan to approximate the effect of this pattern with a define-peephole2 ?  

paul



decrement_and_branch_until_zero pattern

2018-06-07 Thread Paul Koning
I'm a bit puzzled by the decrement_and_branch_until_zero looping pattern.  The 
manual described it as a named pattern, through from the description it isn't 
clear that it's referenced by name.  I see those only in m68k and pa.  There 
are similar looking but anonymous patterns in pdp11 and vax, suggesting that 
those were meant to be recognized by their structure.

One puzzle is that the body of gcc doesn't reference that pattern name as far 
as I can see.

The other puzzle is that I see no sign that the pattern works.  I made up my 
own simple test file and I can't get pdp11, vax, or m68k to generate a loop 
using that pattern.  Stranger yet, there is a test case 
gcc.c-torture/execution/dbra-1.c -- a name that suggests it's meant to test 
this mechanism because dbra is the m68k name for the relevant instruction.  
That test case doesn't generate these looping instructions either (I tried 
those also with m68k, vax, pdp11).  Finally, I tried that file with an old 
4.8.0 build for pdp11 I happened to have lying around.

None of these seem to use that loop optimization, with -O2 or -Os.  Did I miss 
some magic switch to turn on something else that isn't on by default?  Or is 
this a feature that was broken long ago and not noticed?  If so, any hints 
where I might look for a reason?

paul



Re: [PATCH] Avoid optimizing memory references with side effects in compare-elim.c

2018-06-07 Thread Paul Koning



> On Jun 7, 2018, at 11:37 AM, Eric Botcazou  wrote:
> 
>> Here is a corrected version.  Ok with this change?
>> 
>>  paul
>> 
>> gcc/ChangeLog:
>> 
>> 2018-06-06  Paul Koning  
>> 
>>  * compare-elim.c (try_merge_compare): Don't merge compare if
>>  address contains a side effect.
>>  (try_eliminate_compare): Likewise.
>>  * gcc.c-torture/compile/20180605-1.c: New test case.
> 
> OK for mainline, thanks.

Thanks.  I just realized that the testsuite has its own ChangeLog.  
So committed with the change message split between those two.

paul


gcc/ChangeLog:

2018-06-07  Paul Koning  

* compare-elim.c (try_merge_compare): Don't merge compare if
address contains a side effect.
    (try_eliminate_compare): Likewise.

gcc/testsuite/ChangeLog:

2018-06-07  Paul Koning  

* gcc.c-torture/compile/20180605-1.c: New test.



Re: [PATCH] Avoid optimizing memory references with side effects in compare-elim.c

2018-06-06 Thread Paul Koning



> On Jun 6, 2018, at 11:53 AM, Eric Botcazou  wrote:
> 
>> That simplifies the patch, which now looks like this.  Ok for trunk?
>> 
>>  paul
>> 
>> gcc/ChangeLog:
>> 
>> 2018-06-05  Paul Koning  
>> 
>>  * compare-elim.c (try_merge_compare): Don't merge compare if
>>  address contains a side effect.
>>  * gcc.c-torture/compile/20180605-1.c: New test case.
> 
> The patch is OK, but its ChangeLog is not. :-)  It does not modify one 
> function but two (try_merge_compare and try_eliminate_compare).  I would 
> suggest using "diff -up" to generate patches with more context.

Yes, sorry.  I normally use the subversion internal diff which doesn't do this.

Here is a corrected version.  Ok with this change?

paul

gcc/ChangeLog:

2018-06-06  Paul Koning  

* compare-elim.c (try_merge_compare): Don't merge compare if
address contains a side effect.
(try_eliminate_compare): Likewise.
* gcc.c-torture/compile/20180605-1.c: New test case.

Index: compare-elim.c
===
--- compare-elim.c  (revision 261207)
+++ compare-elim.c  (working copy)
@@ -690,6 +690,13 @@ try_merge_compare (struct comparison *cm
 return false;
 
   rtx src = SET_SRC (set);
+
+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (side_effects_p (src))
+return false;
+
   rtx flags = maybe_select_cc_mode (cmp, src, CONST0_RTX (GET_MODE (src)));
   if (!flags)
 {
@@ -809,6 +816,12 @@ try_eliminate_compare (struct comparison
   else
 return false;
 
+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (side_effects_p (cmp_src))
+return false;
+
   /* Determine if we ought to use a different CC_MODE here.  */
   flags = maybe_select_cc_mode (cmp, cmp_src, in_b);
   if (flags == NULL)
Index: testsuite/gcc.c-torture/compile/20180605-1.c
===
--- testsuite/gcc.c-torture/compile/20180605-1.c(nonexistent)
+++ testsuite/gcc.c-torture/compile/20180605-1.c(working copy)
@@ -0,0 +1,9 @@
+void f (int *p, int n)
+{
+int j = 0, k;
+
+for (int i = 0; i < n; i++)
+if ((k = *p++) > 0)
+j += k;
+return j;
+}



Re: [PATCH] Avoid optimizing memory references with side effects in compare-elim.c

2018-06-05 Thread Paul Koning



> On Jun 5, 2018, at 3:54 PM, Richard Sandiford  
> wrote:
> 
> Paul Koning  writes:
>> This fixes an ICE if post-reload compare elimination is done and the
>> target supports post_inc and similar modes, as pdp11 does.  The ICE is
>> caused by a generated PARALLEL that has the side effect twice, which
>> is not legal.
>> 
>> (Ideally it would generate a similar RTL suitable for a matching
>> constraint with the side effect omitted; I may try for that later on
>> if that is still supported by the constraint machinery.)
>> 
>> Tested against my in-progress CCmode pdp11 target.  Ok to commit?
> 
> Thanks for doing the conversion :-)

It has a ways to go still.  And more changes to compare-elim may be needed, 
since pdp11 is an oddball that has two separate CC registers (one for the CPU, 
one for the FP coprocessor).  Compare-elim doesn't currently handle that.

Credit to Eric Botcazou, whose documentation makes this a whole lot easier.

> ...
> It looks like the more general side_effects_p might be better
> in both cases.  It's simpler than defining a custom function,
> and I'm not convinced the pass would handle other kinds of
> side-effect correctly either.

Thanks, I overlooked that function.  Yes, it's a better choice.  For example, 
compare-eliminating a volatile memory reference seems incorrect (it's not clear 
how likely it is to happen, but it makes sense to skip that case).

That simplifies the patch, which now looks like this.  Ok for trunk?

paul

gcc/ChangeLog:

2018-06-05  Paul Koning  

* compare-elim.c (try_merge_compare): Don't merge compare if
address contains a side effect.
* gcc.c-torture/compile/20180605-1.c: New test case.

Index: compare-elim.c
===
--- compare-elim.c  (revision 261207)
+++ compare-elim.c  (working copy)
@@ -690,6 +690,13 @@
 return false;
 
   rtx src = SET_SRC (set);
+
+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (side_effects_p (src))
+return false;
+
   rtx flags = maybe_select_cc_mode (cmp, src, CONST0_RTX (GET_MODE (src)));
   if (!flags)
 {
@@ -809,6 +816,12 @@
   else
 return false;
 
+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (side_effects_p (cmp_src))
+return false;
+
   /* Determine if we ought to use a different CC_MODE here.  */
   flags = maybe_select_cc_mode (cmp, cmp_src, in_b);
   if (flags == NULL)
Index: testsuite/gcc.c-torture/compile/20180605-1.c
===
--- testsuite/gcc.c-torture/compile/20180605-1.c(nonexistent)
+++ testsuite/gcc.c-torture/compile/20180605-1.c(working copy)
@@ -0,0 +1,9 @@
+void f (int *p, int n)
+{
+int j = 0, k;
+
+for (int i = 0; i < n; i++)
+if ((k = *p++) > 0)
+j += k;
+return j;
+}



[PATCH] Avoid optimizing memory references with side effects in compare-elim.c

2018-06-05 Thread Paul Koning
(Resending -- forgot to include a test case.)

This fixes an ICE if post-reload compare elimination is done and the target 
supports post_inc and similar modes, as pdp11 does.  The ICE is caused by a 
generated PARALLEL that has the side effect twice, which is not legal.

(Ideally it would generate a similar RTL suitable for a matching constraint 
with the side effect omitted; I may try for that later on if that is still 
supported by the constraint machinery.)

Tested against my in-progress CCmode pdp11 target.  Ok to commit?

paul

gcc/ChangeLog:

2018-06-05  Paul Koning  

* compare-elim.c (addr_side_effect_check): New function.
(addr_side_effect_p): Ditto.
(try_merge_compare): Don't merge compare if address contains a
side effect.
* gcc.c-torture/compile/20180605-1.c: New test case.


Index: compare-elim.c
===
--- compare-elim.c  (revision 261207)
+++ compare-elim.c  (working copy)
@@ -127,6 +127,23 @@

static vec all_compares;

+/* Callback function used by addr_side_effect_p.  */
+static int
+addr_side_effect_check (rtx mem ATTRIBUTE_UNUSED, rtx op ATTRIBUTE_UNUSED,
+   rtx dest ATTRIBUTE_UNUSED, rtx src ATTRIBUTE_UNUSED,
+   rtx srcoff ATTRIBUTE_UNUSED, void *arg ATTRIBUTE_UNUSED)
+{
+  return 1;
+}
+
+/* Check if addr has side effects (contains autoinc or autodec
+   operations).  */
+static int
+addr_side_effect_p (rtx addr)
+{
+  return for_each_inc_dec (addr, addr_side_effect_check, NULL);
+} 
+  
/* Look for a "conforming" comparison, as defined above.  If valid, return
   the rtx for the COMPARE itself.  */

@@ -690,6 +707,13 @@
return false;

  rtx src = SET_SRC (set);
+
+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (addr_side_effect_p (src))
+return false;
+
  rtx flags = maybe_select_cc_mode (cmp, src, CONST0_RTX (GET_MODE (src)));
  if (!flags)
{
@@ -809,6 +833,12 @@
  else
return false;

+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (addr_side_effect_p (cmp_src))
+return false;
+
  /* Determine if we ought to use a different CC_MODE here.  */
  flags = maybe_select_cc_mode (cmp, cmp_src, in_b);
  if (flags == NULL)
Index: testsuite/gcc.c-torture/compile/20180605-1.c
===
--- testsuite/gcc.c-torture/compile/20180605-1.c(nonexistent)
+++ testsuite/gcc.c-torture/compile/20180605-1.c(working copy)
@@ -0,0 +1,9 @@
+void f (int *p, int n)
+{
+int j = 0, k;
+
+for (int i = 0; i < n; i++)
+if ((k = *p++) > 0)
+j += k;
+return j;
+}



[PATCH] Avoid optimizing memory references with side effects in compare-elim.c

2018-06-05 Thread Paul Koning
This fixes an ICE if post-reload compare elimination is done and the target 
supports post_inc and similar modes, as pdp11 does.  The ICE is caused by a 
generated PARALLEL that has the side effect twice, which is not legal.

(Ideally it would generate a similar RTL suitable for a matching constraint 
with the side effect omitted; I may try for that later on if that is still 
supported by the constraint machinery.)

Tested against my in-progress CCmode pdp11 target.  Ok to commit?

paul

gcc/ChangeLog:

2018-06-05  Paul Koning  

* compare-elim.c (addr_side_effect_check): New function.
(addr_side_effect_p): Ditto.
(try_merge_compare): Don't merge compare if address contains a
side effect.

Index: compare-elim.c
===
--- compare-elim.c  (revision 261207)
+++ compare-elim.c  (working copy)
@@ -127,6 +127,23 @@
   
 static vec all_compares;
 
+/* Callback function used by addr_side_effect_p.  */
+static int
+addr_side_effect_check (rtx mem ATTRIBUTE_UNUSED, rtx op ATTRIBUTE_UNUSED,
+   rtx dest ATTRIBUTE_UNUSED, rtx src ATTRIBUTE_UNUSED,
+   rtx srcoff ATTRIBUTE_UNUSED, void *arg ATTRIBUTE_UNUSED)
+{
+  return 1;
+}
+
+/* Check if addr has side effects (contains autoinc or autodec
+   operations).  */
+static int
+addr_side_effect_p (rtx addr)
+{
+  return for_each_inc_dec (addr, addr_side_effect_check, NULL);
+} 
+  
 /* Look for a "conforming" comparison, as defined above.  If valid, return
the rtx for the COMPARE itself.  */
 
@@ -690,6 +707,13 @@
 return false;
 
   rtx src = SET_SRC (set);
+
+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (addr_side_effect_p (src))
+return false;
+
   rtx flags = maybe_select_cc_mode (cmp, src, CONST0_RTX (GET_MODE (src)));
   if (!flags)
 {
@@ -809,6 +833,12 @@
   else
 return false;
 
+  /* If the source uses addressing modes with side effects, we can't
+ do the merge because we'd end up with a PARALLEL that has two
+ instances of that side effect in it.  */
+  if (addr_side_effect_p (cmp_src))
+return false;
+
   /* Determine if we ought to use a different CC_MODE here.  */
   flags = maybe_select_cc_mode (cmp, cmp_src, in_b);
   if (flags == NULL)



Re: in/out operands and auto-inc-dec

2018-06-04 Thread Paul Koning



> On Jun 4, 2018, at 10:09 AM, Jeff Law  wrote:
> 
> On 06/04/2018 08:06 AM, Paul Koning wrote:
>> 
>> 
>>> On Jun 4, 2018, at 9:51 AM, Jeff Law  wrote:
>>> 
>>> On 06/04/2018 07:31 AM, Paul Koning wrote:
>>>> The internals manual in its description of the "matching constraint" says 
>>>> that it works for cases where the in and out operands are somewhat 
>>>> different, such as *p++ vs. *p.  Obviously that is meant to cover post_inc 
>>>> side effects.
>>>> 
>>>> The curious thing is that auto-inc-dec.c specifically avoids doing this: 
>>>> if it finds what looks like a suitable candidate for auto-inc or auto-dec 
>>>> optimization but that operand occurs more than once in the insn, it 
>>>> doesn't make the change.  The result is code that's both larger and slower 
>>>> for machines that have post_inc etc. addressing modes.  The gccint 
>>>> documentation suggests that it was the intent to optimize this case, so I 
>>>> wonder why it is avoided.
>>> I wouldn't be terribly surprised if the old flow.c based auto-inc
>>> discovery handled this, but the newer auto-inc-dec.c doesn't.  The docs
>>> were probably written prior to the conversion.
>> 
>> That fits, because there is a reference to "the flow pass of the compiler" 
>> when these constructs are introduced in section 14.16.
>> 
>> So is this an omission, or is there a reason why that optimization was 
>> removed?  
> I would guess omission, probably on the assumption it wasn't terribly
> important and there wasn't really a good way to test it.There aren't
> many targets that use auto-inc getting a lot of attention these days,
> and those that do can't have multiple memory operands.

By "multiple memory operands" do you mean both source and dest in memory?  Ok, 
but I didn't mean that specifically.  The issue is on an instruction with a 
read/modify/write destination operand, like two-operand add.  If the 
destination looks like a candidate for post-inc, it's skipped because it shows 
up twice in the RTL -- since that uses three operand notation.

For example:

for (int i = 0; i < n; i++)
*p++ += i;

produces (on pdp11):
add $02, r0
add r1, -02(r0)

rather than simply "add r1, (r0)+".  But if I change the += to =, the expected 
optimization does take place.

paul



Re: in/out operands and auto-inc-dec

2018-06-04 Thread Paul Koning



> On Jun 4, 2018, at 9:51 AM, Jeff Law  wrote:
> 
> On 06/04/2018 07:31 AM, Paul Koning wrote:
>> The internals manual in its description of the "matching constraint" says 
>> that it works for cases where the in and out operands are somewhat 
>> different, such as *p++ vs. *p.  Obviously that is meant to cover post_inc 
>> side effects.
>> 
>> The curious thing is that auto-inc-dec.c specifically avoids doing this: if 
>> it finds what looks like a suitable candidate for auto-inc or auto-dec 
>> optimization but that operand occurs more than once in the insn, it doesn't 
>> make the change.  The result is code that's both larger and slower for 
>> machines that have post_inc etc. addressing modes.  The gccint documentation 
>> suggests that it was the intent to optimize this case, so I wonder why it is 
>> avoided.
> I wouldn't be terribly surprised if the old flow.c based auto-inc
> discovery handled this, but the newer auto-inc-dec.c doesn't.  The docs
> were probably written prior to the conversion.

That fits, because there is a reference to "the flow pass of the compiler" when 
these constructs are introduced in section 14.16.

So is this an omission, or is there a reason why that optimization was removed? 
 

paul




in/out operands and auto-inc-dec

2018-06-04 Thread Paul Koning
The internals manual in its description of the "matching constraint" says that 
it works for cases where the in and out operands are somewhat different, such 
as *p++ vs. *p.  Obviously that is meant to cover post_inc side effects.

The curious thing is that auto-inc-dec.c specifically avoids doing this: if it 
finds what looks like a suitable candidate for auto-inc or auto-dec 
optimization but that operand occurs more than once in the insn, it doesn't 
make the change.  The result is code that's both larger and slower for machines 
that have post_inc etc. addressing modes.  The gccint documentation suggests 
that it was the intent to optimize this case, so I wonder why it is avoided.

paul



Re: [PATCH] add udivhi3, umodhi3 functions to libgcc

2018-06-01 Thread Paul Koning



> On Jun 1, 2018, at 5:04 PM, Joseph Myers  wrote:
> 
> On Tue, 29 May 2018, Paul Koning wrote:
> 
>> +unsigned short udivmodhi4(unsigned short, unsigned short, int);
> 
> libgcc should not have any such non-static functions in the user 
> namespace; they should all start with __.

That too is a problem in the previous code.

Ok, it sounds like the right answer is to withdraw this patch and create a new 
one that handles both this issue and cleans up the way udivsi is done.  That 
would replace the existing udivmod.c and udivmodsi4.c files.

paul



Re: Doing pdp11 cc0->CC conversion, running into ICE related to compare-elim

2018-06-01 Thread Paul Koning



> On Jun 1, 2018, at 2:40 PM, Jakub Jelinek  wrote:
> 
> On Fri, Jun 01, 2018 at 02:35:41PM -0400, Paul Koning wrote:
>> during RTL pass: dse2
>> dump file: unwind-dw2-fde.c.288r.dse2
>> ../../../../gcc/libgcc/unwind-dw2-fde.c: In function ‘get_cie_encoding’:
>> ../../../../gcc/libgcc/unwind-dw2-fde.c:342:1: internal compiler error: in 
>> cselib_record_set, at cselib.c:2409
>> }
>> ^
>> 
>> I don't understand what it's looking for or why it is unhappy.
>> 
>> The RTL that blows up, as reported in the 288r.dse dump file, looks like 
>> this:
> 
> I'd say it is upset by multiple post_inc in the same instruction, that is I
> believe invalid.
> 
>> (insn 195 194 197 11 (parallel [
>>(set (reg:CC 16 cc)
>>(compare:CC (mem:QI (post_inc:HI (reg/v/f:HI 0 r0 [orig:41 p 
>> ] [41])) [0 MEM[base: p_38, offset: 65535B]+0 S1 A8])
>>(const_int 0 [0])))
>>(set (reg:QI 2 r2 [orig:38 byte.10_40 ] [38])
>>(mem:QI (post_inc:HI (reg/v/f:HI 0 r0 [orig:41 p ] [41])) [0 
>> MEM[base: p_38, offset: 65535B]+0 S1 A8]))
>>]) "../../../../gcc/libgcc/unwind-pe.h":166 20 {*mov_ccqi}
>> (expr_list:REG_UNUSED (reg:QI 2 r2 [orig:38 byte.10_40 ] [38])
>>(expr_list:REG_UNUSED (reg/v/f:HI 0 r0 [orig:41 p ] [41])
>>(expr_list:REG_INC (reg/v/f:HI 0 r0 [orig:41 p ] [41])
>>(nil)

I was wondering about that.  compare-elim.cc is generating that.  It looks for 
the general pattern of
(parallel [(load dst src) (clobber cc)])
(compare dst val)
and converts that to
(parallel [ (compare src val) (load dst src)])

where there is a insn pattern of that form, such as this one:

(define_insn "*mov_cc"
  [(set (reg:CC CC_REGNUM)
(compare:CC
 (match_operand:PDPint 1 "general_operand" "rRN,Qi,rRN,Qi")
 (const_int 0)))
   (set (match_operand:PDPint 0 "nonimmediate_operand" "=rR,rR,Q,Q")
(match_dup 1))]
  "reload_completed"

Given that the starting insn had a post_inc in it, what would be a proper 
parallel... construct?  If the post_inc only appears in one of the two mentions 
of the source operatnd, then the match_dup is going to fail.  I suppose I can 
disallow pre and post_inc addressing modes, but that would be a lost 
optimization opportunity.

paul



Doing pdp11 cc0->CC conversion, running into ICE related to compare-elim

2018-06-01 Thread Paul Koning
Gentlepeople,

I'm using Eric Botcazou's recipe #2 for the CCmode version of pdp11 -- where 
most instructions step on the condition codes so the CC references are inserted 
post-reload.  As part of that, the compare-elim pass gets rid of compares that 
are redundant given that the instruction before it already set the CC correctly.

If I do that I get an ICE in cselib that I can't figure out.  If I turn off the 
post-reload compare elimination optimization then everything works.  

The error is this:

during RTL pass: dse2
dump file: unwind-dw2-fde.c.288r.dse2
../../../../gcc/libgcc/unwind-dw2-fde.c: In function ‘get_cie_encoding’:
../../../../gcc/libgcc/unwind-dw2-fde.c:342:1: internal compiler error: in 
cselib_record_set, at cselib.c:2409
 }
 ^

I don't understand what it's looking for or why it is unhappy.

The RTL that blows up, as reported in the 288r.dse dump file, looks like this:

(insn 195 194 197 11 (parallel [
(set (reg:CC 16 cc)
(compare:CC (mem:QI (post_inc:HI (reg/v/f:HI 0 r0 [orig:41 p ] 
[41])) [0 MEM[base: p_38, offset: 65535B]+0 S1 A8])
(const_int 0 [0])))
(set (reg:QI 2 r2 [orig:38 byte.10_40 ] [38])
(mem:QI (post_inc:HI (reg/v/f:HI 0 r0 [orig:41 p ] [41])) [0 
MEM[base: p_38, offset: 65535B]+0 S1 A8]))
]) "../../../../gcc/libgcc/unwind-pe.h":166 20 {*mov_ccqi}
 (expr_list:REG_UNUSED (reg:QI 2 r2 [orig:38 byte.10_40 ] [38])
(expr_list:REG_UNUSED (reg/v/f:HI 0 r0 [orig:41 p ] [41])
(expr_list:REG_INC (reg/v/f:HI 0 r0 [orig:41 p ] [41])
(nil)

I wonder if the REG_UNUSED is the issue?  The input to the compare elimination 
pass looks like this:

(insn 195 194 196 11 (parallel [
(set (reg:QI 2 r2 [orig:38 byte.10_40 ] [38])
(mem:QI (post_inc:HI (reg/v/f:HI 0 r0 [orig:41 p ] [41])) [0 
MEM[base: p_38, offset: 65535B]+0 S1 A8]))
(clobber (reg:CC 16 cc))
]) "../../../../gcc/libgcc/unwind-pe.h":166 18 {*mov_noccqi}
 (expr_list:REG_INC (reg/v/f:HI 0 r0 [orig:41 p ] [41])
(nil)))
(insn 196 195 197 11 (set (reg:CC 16 cc)
(compare:CC (reg:QI 2 r2 [orig:38 byte.10_40 ] [38])
(const_int 0 [0]))) "../../../../gcc/libgcc/unwind-pe.h":166 7 
{*cmpqi}
 (nil))

and the compare (insn 196) was eliminated because there is a pattern for that 
load operation that sets CC the right way.  However... the only reason the load 
was done was to provide the input to the compare.  So by eliminating the 
compare, the load no longer serves a purpose, as indicated by the REG_UNUSED 
note in the DSE output.  Is that why it complained? 

If that's the issue, what can I do to make it go away?  One answer would be not 
to have that load; the compare can accept a memory operand just fine, 
presumably the costs need adjusting.  But it seems a bit odd to rely on costs 
to avoid an ICE.

paul



<    1   2   3   4   5   >