[committed][SH] Fix 101737

2024-03-02 Thread Oleg Endo
Hi,

The attached patch should fix PR 101737.  It's a rather obvious oversight. 
Sanity tested with 'make all-gcc'.  Committed to master, gcc-13, gcc-12,
gcc-11.

Cheers,
Oleg


gcc/ChangeLog:
PR target/101737
* config/sh/sh.cc (sh_is_nott_insn): Handle case where the input
is not an insn, but e.g. a code label.
From 4ff8ffe7331cf174668cf5c729fd68ff327ab014 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Sun, 3 Mar 2024 14:58:58 +0900
Subject: [PATCH] SH: Fix 101737

gcc/ChangeLog:
	PR target/101737
	* config/sh/sh.cc (sh_is_nott_insn): Handle case where the input
	is not an insn, but e.g. a code label.
---
 gcc/config/sh/sh.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 2c4..ef3c2e6 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -11766,9 +11766,10 @@ sh_insn_operands_modified_between_p (rtx_insn* operands_insn,
negates the T bit and stores the result in the T bit.  */
 bool
 sh_is_nott_insn (const rtx_insn* i)
 {
-  return i != NULL && GET_CODE (PATTERN (i)) == SET
+  return i != NULL_RTX && PATTERN (i) != NULL_RTX
+	 && GET_CODE (PATTERN (i)) == SET
 	 && t_reg_operand (XEXP (PATTERN (i), 0), VOIDmode)
 	 && negt_reg_operand (XEXP (PATTERN (i), 1), VOIDmode);
 }
 
--
libgit2 1.6.4



Re: [PATCH] m68k: restore bootstrap

2024-02-18 Thread Oleg Endo
On Sun, 2024-02-18 at 08:42 -0700, Jeff Law wrote:
> 
> On 2/18/24 02:18, Mikael Pettersson wrote:
> > m68k fails to bootstrap since -ffold-mem-offsets was introduced,
> > in what looks like wrong-code during stage2.
> > 
> > To restore bootstrap this disables -ffold-mem-offsets on m68k.
> > It's not ideal, but better than keeping bootstraps broken until
> > the root cause is debugged and fixed.
> > 
> > Tested with a bootstrap and regression test run on m68k-linux-gnu.
> > 
> > Ok for master? (I'll need help getting it committed.)
> > 
> > gcc/
> > PR target/113357
> > * config/m68k/m68k.cc (m68k_option_override): Disable
> > -ffold-mem-offsets.  Fix typo in comment.
> Definitely not OK.This needs to be debugged further, just disabling 
> the pass is not the right solution here.
> 
> It is also worth noting I'm bootstrapping and regression testing the 
> m68k weekly.
> 
> 

Jeff, could you please consider sharing your test setup so that others can
reproduce it as well?

I'd be really better if more people had access to a unified test setup and
methodology.

Best regards,
Oleg Endo


Re: [committed] Adjust expectations for pr59533-1.c

2024-01-21 Thread Oleg Endo


On Sun, 2024-01-21 at 19:14 -0700, Jeff Law wrote:
> The change for pr111267 twiddled code generation for sh/pr59533-1.c
> 
> We end up eliminating two comparisons, but require two shll instructions 
> to do so.  And in a couple places we're using an addc sequence rather 
> than a subc sequence.   This patch adjusts the expected codegen for the 
> test as all those are either a wash or a
> 
> The fwprop change does cause some code regressions on the same test. 
> I'll file a distinct but for that issue.
> 
> Pushed to the trunk,
> 
> Jeff

Thanks for keeping an eye on this.

Note that on SH4 the comparison insns are of MT type, which increases
likelihood of parallel execution.  So it's better to use those e.g. to shift
out the MSB into T bit than shll.

Cheers,
Oleg


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo
On Sun, 2023-11-19 at 19:51 -0700, Jeff Law wrote:
> 
> On 11/19/23 18:22, Oleg Endo wrote:
> > 
> > On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> > > This is work originally started by Joern @ Embecosm.
> > > 
> > > There's been a long standing sense that we're generating too many
> > > sign/zero extensions on the RISC-V port.  REE is useful, but it's really
> > > focused on a relatively narrow part of the extension problem.
> > > 
> > > What Joern's patch does is introduce a new pass which tracks liveness of
> > > chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31
> > > and 32..63.
> > > 
> > > If it encounters a sign/zero extend that sets bits that are never read,
> > > then it replaces the sign/zero extension with a narrowing subreg.  The
> > > narrowing subreg usually gets eliminated by subsequent passes (it's just
> > > a copy after all).
> > > 
> > 
> > Have you tried it on SH, too?  (and if so any numbers?)


> Just bootstrap with C regression testing on sh4/sh4eb.  No data on 
> improvements.
> 

Alright.  I'll check what it does for SH once it's in.

Cheers,
Oleg


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo


On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> This is work originally started by Joern @ Embecosm.
> 
> There's been a long standing sense that we're generating too many 
> sign/zero extensions on the RISC-V port.  REE is useful, but it's really 
> focused on a relatively narrow part of the extension problem.
> 
> What Joern's patch does is introduce a new pass which tracks liveness of 
> chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31 
> and 32..63.
> 
> If it encounters a sign/zero extend that sets bits that are never read, 
> then it replaces the sign/zero extension with a narrowing subreg.  The 
> narrowing subreg usually gets eliminated by subsequent passes (it's just 
> a copy after all).
> 

Have you tried it on SH, too?  (and if so any numbers?)

It sounds like this one would be great to remove some of the sign/zero
extension removal hackery that I've accumulated in the SH backend.

Cheers,
Oleg


[SH][committed] Fix PR 111001

2023-10-23 Thread Oleg Endo
The attached patch fixes PR 111001.

Committed to master, cherry-picked to GCC-13, GCC-12 and GCC-11.
Sanity tested with 'make all-gcc'.
Bootstrapped on GCC-13 sh4-linux by Adrian.

Cheers,
Oleg

gcc/ChangeLog:

PR target/111001
* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
Skip over nop move insns.

From 4414818f4e5de54ea3c353e2ebb2e79a89ae211b Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Mon, 23 Oct 2023 22:08:37 +0900
Subject: [PATCH] SH: Fix PR 111001

gcc/ChangeLog:

	PR target/111001
	* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
	Skip over nop move insns.
---
 gcc/config/sh/sh_treg_combine.cc |  9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh_treg_combine.cc b/gcc/config/sh/sh_treg_combine.cc
index f6553c0..685ca54 100644
--- a/gcc/config/sh/sh_treg_combine.cc
+++ b/gcc/config/sh/sh_treg_combine.cc
@@ -731,9 +731,16 @@ sh_treg_combine::record_set_of_reg (rtx reg, rtx_insn *start_insn,
 	  new_entry.cstore_type = cstore_inverted;
 	}
   else if (REG_P (new_entry.cstore.set_src ()))
 	{
-	  // If it's a reg-reg copy follow the copied reg.
+	  // If it's a reg-reg copy follow the copied reg, but ignore
+	  // nop copies of the reg onto itself.
+	  if (REGNO (new_entry.cstore.set_src ()) == REGNO (reg))
+	{
+	  i = prev_nonnote_nondebug_insn_bb (i);
+	  continue;
+	}
+
 	  new_entry.cstore_reg_reg_copies.push_back (new_entry.cstore);
 	  reg = new_entry.cstore.set_src ();
 	  i = new_entry.cstore.insn;
 
--
libgit2 1.3.2



[SH][committed] Fix PR 101177

2023-10-20 Thread Oleg Endo
The attached patch fixes PR 101177.

Committed to master, cherry-picked to GCC-13, GCC-12 and GCC-11.
Sanity tested with 'make all-gcc'.

Cheers,
Oleg

gcc/ChangeLog:

PR target/101177
* config/sh/sh.md (unnamed split pattern): Fix comparison of
find_regno_note result.

From 3ce4e99303d01d348229cca22bf8d3dd63004e01 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Fri, 20 Oct 2023 18:48:34 +0900
Subject: [PATCH] SH: Fix PR 101177

Fix accidentally inverted comparison.

gcc/ChangeLog:

	PR target/101177
	* config/sh/sh.md (unnamed split pattern): Fix comparison of
	find_regno_note result.
---
 gcc/config/sh/sh.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 76e7774..93374c6 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -841,9 +841,9 @@
   rtx reg = operands[0];
   if (SUBREG_P (reg))
 reg = SUBREG_REG (reg);
   gcc_assert (REG_P (reg));
-  if (find_regno_note (curr_insn, REG_DEAD, REGNO (reg)) != NULL_RTX)
+  if (find_regno_note (curr_insn, REG_DEAD, REGNO (reg)) == NULL_RTX)
 FAIL;
 
   /* FIXME: Maybe also search the predecessor basic blocks to catch
  more cases.  */
--
libgit2 1.3.2



Re: RISC-V: Added support for CRC.

2023-09-26 Thread Oleg Endo
On Sun, 2023-09-24 at 00:05 +0100, Joern Rennecke wrote:
> 
> Although maybe Oleg Endo's library, as mentioned in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591748.html ,
> might be suitable?  What is the license for that?
> 
> 

I haven't published the library, but I think I could do that.

It's a C++-14 header-only thing and uses templates + constexpr to generate
the .rodata lookup tables.  It's convenient for an application project, as
it doesn't require any generator tool in the build.  This might be not a big
advantage in the context of GCC.

Since the tables are computed during compile-time, there is no particular
optimization implemented.  The run-time function is also nothing fancy:

static constexpr uint8_t table_index (value_type rem, uint8_t x)
{
  if (ReflectInput)
return x ^ rem;
  else
return x ^ (BitCount > 8 ? (rem >> (BitCount - 8))
 : (rem << (8 - BitCount)));
}

static constexpr value_type shift (value_type rem)
{
  return ReflectInput ? rem >> 8 : rem << 8;
}

static value_type
default_process_bytes (value_type rem, const uint8_t* in, const uint8_t* in_end)
{
  for (; in != in_end; ++in)
  {
auto i = table_index (rem, *in);
rem = table[i] ^ shift (rem);
  }
  return rem;
}

Anyway, let me know if anyone is interested.

Cheers,
Oleg


[SH][committed] Fix PR 101469

2023-07-13 Thread Oleg Endo
Hi,

The attached patch fixes PR 101469.
Tested by the original reporter Rin Okuyama on NetBSD with GCC 10.5.
Applied to master, GCC 11, GCC 12, GCC 13 after 'make all' sanity check.

Cheers,
Oleg


gcc/ChangeLog:

PR target/101469
* config/sh/sh.md (peephole2): Handle case where eliminated reg
is also used by the address of the following memory
operand.

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 4622dba0121..76e7774cef3 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -10680,6 +10680,45 @@
&& peep2_reg_dead_p (2, operands[1]) && peep2_reg_dead_p (3, operands[0])"
   [(const_int 0)]
 {
+  if (MEM_P (operands[3]) && reg_overlap_mentioned_p (operands[0], operands[3]))
+{
+  // Take care when the eliminated operand[0] register is part of
+  // the destination memory address.
+  rtx addr = XEXP (operands[3], 0);
+
+  if (REG_P (addr))
+	operands[3] = replace_equiv_address (operands[3], operands[1]);
+
+  else if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0))
+	   && CONST_INT_P (XEXP (addr, 1))
+	   && REGNO (operands[0]) == REGNO (XEXP (addr, 0)))
+	operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, operands[1], XEXP (addr, 1)));
+
+  else if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0))
+	   && REG_P (XEXP (addr, 1)))
+{
+  // register + register address  @(R0, Rn)
+  // can change only the Rn in the address, not R0.
+  if (REGNO (operands[0]) == REGNO (XEXP (addr, 0))
+	  && REGNO (XEXP (addr, 0)) != 0)
+	{
+	  operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, operands[1], XEXP (addr, 1)));
+	}
+  else if (REGNO (operands[0]) == REGNO (XEXP (addr, 1))
+		   && REGNO (XEXP (addr, 1)) != 0)
+{
+	  operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, XEXP (addr, 0), operands[1]));
+}
+  else
+FAIL;
+}
+  else
+FAIL;
+}
+
   emit_insn (gen_addsi3 (operands[1], operands[1], operands[2]));
   sh_peephole_emit_move_insn (operands[3], operands[1]);
 })


Re: RFA: crc builtin functions & optimizations

2022-03-14 Thread Oleg Endo
On Mon, 2022-03-14 at 18:04 -0700, Andrew Pinski via Gcc-patches wrote:
> On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
>  wrote:
> > 
> > Most microprocessors have efficient ways to perform CRC operations, be
> > that with lookup tables, rotates, or even special instructions.
> > However, because we lack a representation for CRC in the compiler, we
> > can't do proper instruction selection.  With this patch I seek out to
> > rectify this,
> > I've avoided using a mode name for the built-in functions because that
> > would tie the semantics to the size of the addressable unit.  We
> > generally use abbreviations like s/l/ll for type names, which is all
> > right when the type can be widened without changing semantics.  For
> > the data input, however, we also have to consider the shift count that
> > is tied to it.  That is why I used a number to designate the width of
> > the data input and shift.
> > 
> > For machine support, I made a start with 8 and 16 bit little-endian
> > CRC for RISCV using a
> > lookup table.  I am sure once we have the basic infrastructure in the
> > tree, we'll get more
> > contributions of suitable named patterns for various ports.
> 
> 
> A few points.
> There are at least 9 different polynomials for the CRC-8 in common use today.
> For CRC-32 there are 5 different polynomials used.
> You don't have a patch to invoke.texi adding the descriptions of the builtins.
> How is your polynom 3rd argument described? Is it similar to how it is
> done on the wiki for the CRC?
> Does it make sense to have to list the most common polynomials in the
> documentation?
> 
> Also I am sorry but micro-optimizing coremarks is just wrong. Maybe it
> is better to pick the CRC32 that is inside zip instead for a testcase
> and benchmarking against?
> Or even the CRC32C for iSCSI/ext4.
> 
> I see you also don't optimize the case where you have three other
> variants of polynomials that are reversed, reciprocal and reversed
> reciocal.

In my own CRC library I've got ~30 'commonly used' CRC types, based on
the following generic definition:

template <
  // number of crc result bits (polynomial order in bits)
  unsigned int BitCount,

  // normal polynomial without the leading 1 bit.
  typename crc_impl_detail::select_int::type TruncPoly,

  // initial remainder
  typename crc_impl_detail::select_int::type InitRem = 0,

  // final xor value
  typename crc_impl_detail::select_int::type FinalXor = 0,

  // input data byte reflected before processing (LSB / MSB first)
  bool ReflectInput = false,

  // output CRC reflected before the xor
  bool ReflectRemainder = false >
class crc
{
...
};


and then it goes like ...

// CRC-1 (most hardware; also known as parity bit)
// x + 1
typedef crc < 1, 0x01 > crc_1;

// CRC-3
typedef crc < 3, 0x03, 0x07, 0x00, true, true> crc_3;

...

// CRC-32 (ISO 3309, ANSI X3.66, FIPS PUB 71, FED-STD-1003, ITU-T V.42, 
Ethernet, SATA, MPEG-2, Gzip, PKZIP, POSIX cksum, PNG, ZMODEM)
// x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + 
x^4 + x^2 + x + 1
typedef crc < 32, 0x04C11DB7, 0x, 0x, true, true > crc_32;

typedef crc < 32, 0x04C11DB7, 0x7FFF, 0x7FFF, false, false > 
crc_32_mpeg2;
typedef crc < 32, 0x04C11db7, 0x, 0x, false, false > 
crc_32_posix;

...


It then generates the lookup tables at compile time into .rodata for
the types that are used in the program, which is great for MCUs with
more flash/ROM than RAM.

Specific CRC types can be overridden if the system has a better way to
calculate the CRC, e.g. as hardware peripheral.

This being a library makes it relatively easy to tune and customize for
various systems.

How would that work together with your proposal?

Cheers,
Oleg



Re: [PATCH] sh-linux fix target cpu

2022-01-30 Thread Oleg Endo
On Fri, 2022-01-28 at 15:18 -0700, Jeff Law via Gcc-patches wrote:
> 
> On 1/12/2022 2:02 AM, Yoshinori Sato wrote:
> > sh-linux not supported any SH1 and SH2a little-endian.
> > Add exceptios it.
> > 
> > gcc/ChangeLog:
> > 
> > * config/sh/t-linux (MULTILIB_EXCEPTIONS): Add m1, mb/m1 and m2a.
> Thanks.  Technically this is probably too late to make gcc-12 as we're 
> in stage4 (regression fixes only).  BUt it was posted during stage3 
> (general bugfixing) and is very very low risk.
> 
> I went ahead and committed it for you.
> 
> Thanks, and sorry for the delays.


Thanks, Jeff!

Cheers,
Oleg



Re: [PATCH 10/11] sh: Update unexpected empty split condition

2021-06-01 Thread Oleg Endo
On Wed, 2021-06-02 at 00:05 -0500, Kewen Lin wrote:
> gcc/ChangeLog:
> 
>   * config/sh/sh.md (doloop_end_split): Fix empty split condition.
> ---
>  gcc/config/sh/sh.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index e3af9ae21c1..93ee7c9a7de 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -6424,7 +6424,7 @@ (define_insn_and_split "doloop_end_split"
> (clobber (reg:SI T_REG))]
>"TARGET_SH2"
>"#"
> -  ""
> +  "&& 1"
>[(parallel [(set (reg:SI T_REG)
>  (eq:SI (match_dup 2) (const_int 1)))
> (set (match_dup 0) (plus:SI (match_dup 2) (const_int -1)))])

This is OK (obvious).

Cheers,
Oleg



Re: PING [PATCH] RX new builtin function

2020-08-10 Thread Oleg Endo
On Mon, 2020-08-10 at 13:51 +0300, Darius Galis wrote:
> 
> I've found the following patch 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-11/msg00983.html, but it 
> is not in the latest sources.
> Could please let me know why it was not added? I'm willing to do any 
> rework necessary in order for it to be accepted to the latest sources.

I think it'd be better to fix and/or improve the backend code so that
the compiler generates the bset instruction automatically.  Otherwise
this built-in is not very useful for user code.  Except for the use
case of atomic-or on byte memory location, as a "side effect".  But
that should be implemented as such -- as atomics.

There are a couple of PRs that might be relevant/interesting

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93587
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83832
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81823

Cheers,
Oleg



Re: [committed] Fix latent bug due to peephole2 pass dropping REG_INC notes

2020-05-31 Thread Oleg Endo
Hi Jeff,

On Sun, 2020-05-31 at 11:20 -0600, Jeff Law via Gcc-patches wrote:
> 
> The peephole2 pass makes some attempt to update various notes, but that 
> doesn't
> include REG_INC notes.  While I could trivially fix this in the H8 port, I
> wouldn't be terribly surprised if the lack of a REG_INC note could cause 
> problems
> on other ports.  So I think it's best to fix in the peephole pass.
> 
> As it turns out we already have  a function (two copies even!) to scan an insn
> for auto-inc addressing modes and add an appropriate note.
> 
> This patch moves that code from reload & lra into rtlanal.c and calls it from 
> the
> peephole pass.

I ran into this issue a while ago.
See also config/sh/sh.c, function sh_check_add_incdec_notes.

Is it possible to somehow fold all that into a universal solution?

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-03-07 Thread Oleg Endo
On Thu, 2020-03-05 at 08:51 -0700, Jeff Law wrote:
> 
> FWIW I've got an sh4/sh4eb bootstrap and regression test running with
> HONOR_REG_ALLOC_ORDER defined.  As Vlad mentioned, that may be a
> viable workaround.
> 

I've had a look at the good old CSiBE code size results and poked at
some of the cases.  Overall, it seems to help code quality when
HONOR_REG_ALLOC_ORDER is defined on SH.

sum:  3383449 -> 3379629-3820 / -0.112903 %
avg: -212.22 / -0.271573 %
max: flex-2.5.31  253514 -> 253718+204 / +0.080469 %
min: bzip2-1.0.2   67202 -> 65938-1264 / -1.880896 %


However, even with HONOR_REG_ALLOC_ORDER defined, the simple test case
from PR 81426 https://gcc.gnu.org/bugzilla/attachment.cgi?id=47159
fails to compile without -mlra (use options -m4 -matomic-model=soft-gusa on 
regular non-linux sh-elf cross compiler).

How about the bootstrap, Jeff?  Did it help anything?

Cheers,
Oleg




Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 12:35 -0700, Jeff Law wrote:
> 
> Yup.  That was roughly what I was thinking and roughly the worry I had with
> trying to squash out the quality regressions.  But it may ultimately be the
> only way to really resolve these issues.

Another idea would be to let RA see R0, but ignore all the R0
constraints.  Then try fixing up everything afterwards.  If R0 is
removed from the allocatable reg list, there will be one register less
for it to work with and I'd expect some code quality regressions.  But
in order to fix up all the R0 cases after the regular RA/reload, I
believe it will have to re-do a lot of (similar) work that has been
done by the regular RA already.  One thing that comes instantly to mind
are loops and the use of R0 as index/base register in memory addressing
... it just sounds like a lot of duplicate work in general.

> 
> DJ's work on the m32c IIRC might be useful if you do try to chase this stuff
> down.  Essentially there weren't really enough registers.  So he had the port
> pretend to have more than it really did, then had a post-reload pass to do the
> final allocation into the target's actual register file.
> 

AFAIK DJ did the same (or similar) thing for RL78.  IMHO that just
shows that one type of RA/reload does not fit all.  Perhaps it'd be
better to have the option of different RA/reload implementations, which
implement different strategies for different needs and priorities.

Anyway, on SH the R0 problem seems to go away with LRA for the most
part.  I don't know if anything has been put in LRA specifically to
address such cases, or it works by general definition of the design, or
it's just a mere coincidence.  If it's the latter case, I'm not sure
what to expect in the future.  Perhaps it will start breaking again if
changes for other targets are being made to LRA.

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 09:38 -0700, Jeff Law wrote:
> 
> It really would have just been a workaround for some of the R0 issues anyway. 
> I think at its core R0 on the SH probably needs to be treated more like a
> temporary rather than a general register.  But that's probably a huge change,
> both in terms of just getting it working right and in terms of addressing the
> code quality regressions that would introduce.
> 

I think one of the major issues is that R0 is a constraint in several
addressing modes for memory accesses.  I believe I once had the idea of
hiding R0 from RA ... then insert reg-reg copies (to load R0) after
RA/reload ... and then somehow do back propagation to get rid of the
reg-reg copies again.  Another idea was to run a pre-RA pass to pre-
allocate all R0 things.  But I think it's all just running in sqrt(1)
circles after all.

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 08:57 -0700, Jeff Law wrote:
> 
> > It could open a can of worms.  Off the top of my head, R0 is used to
> > hold the function return value, and R0:R1 to return structs with sizeof
> > > 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
> > 
> > AFAIR, doesn't it?  Would that kind of thing cause troubles?
> 
> It might.  We might have to move a pair or even a quad if you have modes that
> cover r0-r3. It may not be feasible in practice.  I was just thinking off the
> top of my head.
> 

Yeah, for instance 'double _Complex' will be returned in R0-R3 when
compiling for 'without FPU'.  How about adding a target hook or look-up 
table (default 1:1 mapping for other targets)?  Would that be an
option?

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 08:47 -0700, Jeff Law wrote:
> 
> It's almost certainly the case that the recent IRA changes are going to stress
> R0 more.  If I'm reading what Vlad did correctly, one of the tie-breakers its
> using now is to choose the lowest numbered register when all else is equal.  
> So
> R0 on SH is likely going to be more problematical.
> 
> I wonder if just reordering the regs on the SH (and adjusting the debug output
> to keep that working) would be enough to mitigate some of the R0 problems.

It could open a can of worms.  Off the top of my head, R0 is used to
hold the function return value, and R0:R1 to return structs with sizeof
> 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
AFAIR, doesn't it?  Would that kind of thing cause troubles?

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Fri, 2020-02-28 at 13:24 -0700, Jeff Law wrote:
> This change:
> 
> > commit 3133bed5d0327e8a9cd0a601b7ecdb9de4fc825d
> > Author: Vladimir N. Makarov 
> > Date:   Sun Feb 23 16:20:05 2020 -0500
> > 
> > Changing cost propagation and ordering colorable bucket
> > heuristics for
> > PR93564.
> > 
> > 2020-02-23  Vladimir Makarov  
> > 
> > PR rtl-optimization/93564
> > * ira-color.c (struct update_cost_queue_elem): New
> > member start.
> > (queue_update_cost, get_next_update_cost): Add new arg
> > start.
> > (allocnos_conflict_p): New function.
> > (update_costs_from_allocno): Add new arg
> > conflict_cost_update_p.
> > Add checking conflicts with allocnos_conflict_p.
> > (update_costs_from_prefs, restore_costs_from_copies):
> > Adjust
> > update_costs_from_allocno calls.
> > (update_conflict_hard_regno_costs): Add checking
> > conflicts with
> > allocnos_conflict_p.  Adjust calls of queue_update_cost
> > and
> > get_next_update_cost.
> > (assign_hard_reg): Adjust calls of
> > queue_update_cost.  Add
> > debugging print.
> > (bucket_allocno_compare_func): Restore previous
> > version.
> > 
> 
> Is causing c-torture/compile/sync-1 to fail with an ICE on sh4eb
> (search for
> "Tests that now fail, but worked before":
> 
> 
> http://3.14.90.209:8080/job/sh4eb-linux-gnu/lastFailedBuild/console
> 
> 
> In the .log we have:
> 
> > /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1:
> > error:
> > unable to find a register to spill in class 'R0_REGS'^M
> > /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1:
> > error: this
> > is the insn:^M
> > (insn 209 207 212 2 (parallel [^M
> > (set (subreg:SI (reg:HI 431) 0)^M
> > (unspec_volatile:SI [^M
> > (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
> > (subreg:HI (reg:SI 6 r6 [orig:425 uc+-3 ]
> > [425]) 2)^M
> > (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
> > ] UNSPECV_CMPXCHG_1))^M
> > (set (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
> > (unspec_volatile:HI [^M
> > (const_int 0 [0])^M
> > ] UNSPECV_CMPXCHG_2))^M
> > (set (reg:SI 147 t)^M
> > (unspec_volatile:SI [^M
> > (const_int 0 [0])^M
> > ] UNSPECV_CMPXCHG_3))^M
> > (clobber (scratch:SI))^M
> > (clobber (reg:SI 0 r0))^M
> > (clobber (reg:SI 1 r1))^M
> > ]) "/home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-
> > 1.c":245:8 
> > 406 {atomic_compare_and_swaphi_soft_gusa}^M
> >  (expr_list:REG_DEAD (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
> > (expr_list:REG_DEAD (reg:SI 6 r6 [orig:425 uc+-3 ] [425])^M
> > (expr_list:REG_DEAD (reg/f:SI 299)^M
> > (expr_list:REG_UNUSED (reg:HI 431)^M
> > (expr_list:REG_UNUSED (reg:SI 1 r1)^M
> > (expr_list:REG_UNUSED (reg:SI 0 r0)^M
> > (nil^M
> > 
> 
> You should be able to trigger it with a cross compiler at -O2 with
> the attached
> testcase.
> 
> This could well be a target issue.  I haven't tried to debug it.  If
> it's a
> target issue, I'm fully comfortable punting it to the SH folks for
> resolving.

The R0_REGS spill failure is a general problem, in particular with old
reload.  The atomic patterns tend to trigger it in one circumstance or
the other.  The IRA change probably just stresses it more.  Perhaps it
will go away with -mlra.

However, LRA on SH still has its own issues, so it can't be generally
enabled by default yet, unfortunately.  See also some of the recent
posts in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877

Cheers,
Oleg



Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 16:09 +0100, John Paul Adrian Glaubitz wrote:
> Hi!
> 
> On 12/13/19 4:06 PM, Oleg Endo wrote:
> > > What are the combine2 patches?
> > 
> > See the other thread that I've linked in my message.
> 
> I don't see any patch there.

You'd have to crawl up the discussion or so.
And I think there were a couple of versions.  Anyway, I don't think it
made it into trunk yet. 

> > 
> > Have you tried rebuilding debian on/for SH with -mlra enabled for
> > *everything*?  Do you have an easy way of doing that?  It would be
> > interesting to see how it goes.
> 
> Yes, that would be possible. We would have to enable -mlra in gcc by
> default and then trigger a rebuild for 10.000 source packages. But
> that would take a while to finish.
> 

Better start now then :)
No hurry though.

Cheers,
Oleg



Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 15:57 +0100, John Paul Adrian Glaubitz wrote:
> Hello Segher!
> 
> > With LRA, sh builds fine (with the combine2 patches).  I have no idea
> > if correct code is generated, but it doesn't ICE anymore.
> 
> What are the combine2 patches?

See the other thread that I've linked in my message.


>  And I would support switching SH to LRA as
> there are a few cases (Debian packages) where GCC fails with an internal
> compiler error which I reported to the GCC bugzilla.

Have you tried rebuilding debian on/for SH with -mlra enabled for
*everything*?  Do you have an easy way of doing that?  It would be
interesting to see how it goes.

Cheers,
Oleg





Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 08:09 -0600, Segher Boessenkool wrote:
> On Fri, Dec 13, 2019 at 10:06:20PM +0900, Oleg Endo wrote:
> > On Fri, 2019-12-13 at 05:03 -0600, Segher Boessenkool wrote:
> > > On Thu, Dec 12, 2019 at 09:32:27AM +, Richard Sandiford
> > > wrote:
> > > > I doubt it will be long before we deprecate
> > > > all targets that require old reload.)
> > > 
> > > Do we wait until GCC 12 (to remove old reload completely)?  If
> > > not, we
> > > should deprecate it now.
> > > 
> > 
> > Segher, could you please re-run your tests on SH with -mlra as
> > mentioned here?
> > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00133.html
> > 
> > I'm thinking to make -mlra the default on SH.
> 
> With LRA, sh builds fine (with the combine2 patches).  I have no idea
> if correct code is generated, but it doesn't ICE anymore.
> 

Great, thanks for checking.  I'll try to run some more tests.

Cheers,
Oleg



Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 05:03 -0600, Segher Boessenkool wrote:
> On Thu, Dec 12, 2019 at 09:32:27AM +, Richard Sandiford wrote:
> > I doubt it will be long before we deprecate
> > all targets that require old reload.)
> 
> Do we wait until GCC 12 (to remove old reload completely)?  If not, we
> should deprecate it now.
> 

Segher, could you please re-run your tests on SH with -mlra as
mentioned here?
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00133.html

I'm thinking to make -mlra the default on SH.

Cheers,
Oleg





Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-07 Thread Oleg Endo
On Tue, 2019-11-26 at 07:38 +0100, ste...@franke.ms wrote:
> > On 11/21/19 10:30 AM, ste...@franke.ms wrote:
> > > Hi there,
> > > 
> > > here is mc68k's patch to switch the m68k architecture over to ccmode and
> > > lra. See https://github.com/mc68kghost/gcc 68k-ccmode branch.
> > 
> > Bernd Schmidt posted a conversion of the m68k port to ccmode a couple
> > weeks before yours.  We've already ACK'd it for installing onto the trunk.
> > 
> > Jeff
> 
> To be honest:
> - 8 days is hardly "a couple weeks before"
> - ccmode is not the same as ccmode+lra
> 
> The paperwork for contributing to fsf is on the way and the repo at
> https://github.com/mc68kghost/gcc got an update. Tests are not yet at 100%
> (master branch fails too many tests) but it's closer to master branch now.
> The code is to 50% identical, a fair amount has swapped cmp/bcc, few are a
> tad worse and some yield surprisingly better code.
> 

You can still submit patches for further improvements, like adding
support for LRA.  Now that the main CCmode conversion is on trunk and
has been confirmed and tested, it should be much easier for you to
pinpoint problems in your changes.

Cheers,
Oleg



Re: Add a new combine pass

2019-12-06 Thread Oleg Endo
On Fri, 2019-12-06 at 16:51 -0600, Segher Boessenkool wrote:
> On Wed, Dec 04, 2019 at 07:43:30PM +0900, Oleg Endo wrote:
> > On Tue, 2019-12-03 at 12:05 -0600, Segher Boessenkool wrote:
> > > > Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
> > > > explicitly, but it's got a R0_REGS class with only one said reg in it. 
> > > > So the default impl of class_likely_spilled should do its thing.
> > > 
> > > Yes, good point.  So what happened here?
> > 
> > "Something, somewhere, went terribly wrong"...
> > 
> > insn 18 wants to do
> > 
> > mov.l @(r4,r6),r0
> > 
> > But it can't because the reg+reg address mode has a R0 constraint
> > itself.  So it needs to be changed to
> > 
> > mov   r4,r0
> > mov.l @(r0,r6),r0
> > 
> > And it can't handle that.  Or only sometimes?  Don't remember.
> > 
> > >   Is it just RA messing things
> > > up, unrelated to the new pass?
> > 
> > Yep, I think so.  The additional pass seems to create "tougher" code so
> > reload passes out earlier than usual.  We've had the same issue when
> > trying address mode selection optimization.  In fact that was one huge
> > showstopper.
> 
> So maybe you should have a define_insn_and_split that allows any two
> regs and replaces one by r0 if neither is (and a move to r0 before the
> load)?  Split after reload of course.
> 
> It may be admitting defeat, but it may even result in better code as
> well ;-)
> 

AFAIR I've tried that already and it was just like running in circles. 
Means it didn't help.  Perhaps if R0_REGS was hidden from RA altogether
it might work.  But that sounds like opening a whole other can of
worms.  Another idea I was entertaining was to do a custom RTL pass to
pre-allocate all R0 constraints before the real full RA.  But then the
whole reload stuff would still have the same issue as above.  So all
the wallpapering is just moot.  Proper fix of the actual problem would
be more appropriate.

Cheers,
Oleg



Re: Add a new combine pass

2019-12-04 Thread Oleg Endo
On Tue, 2019-12-03 at 12:05 -0600, Segher Boessenkool wrote:
> On Tue, Dec 03, 2019 at 10:33:48PM +0900, Oleg Endo wrote:
> > On Mon, 2019-11-25 at 16:47 -0600, Segher Boessenkool wrote:
> > > 
> > > > > - sh (that's sh4-linux):
> > > > > 
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c: In function 
> > > > > 'snmp_get_cpu_field':
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to 
> > > > > find a register to spill in class 'R0_REGS'
> > > > >  1638 | }
> > > > >   | ^
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the 
> > > > > insn:
> > > > > (insn 18 17 19 2 (set (reg:SI 0 r0)
> > > > > (mem:SI (plus:SI (reg:SI 4 r4 [178])
> > > > > (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> > > > > "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> > > > >  (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> > > > > (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> > > > > (nil
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier 
> > > > > errors, bailing out
> > > > 
> > > > Would have to look more at this one.  Seems odd that it can't allocate
> > > > R0 when it's already the destination and when R0 can't be live before
> > > > the insn.  But there again, this is reload, so my enthuasiasm for 
> > > > looking
> > > > is a bit limited :-)
> > > 
> > > It wants to use r0 in some other insn, so it needs to spill it here, but
> > > cannot.  This is what class_likely_spilled is for.
> > 
> > Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
> > explicitly, but it's got a R0_REGS class with only one said reg in it. 
> > So the default impl of class_likely_spilled should do its thing.
> 
> Yes, good point.  So what happened here?

"Something, somewhere, went terribly wrong"...

insn 18 wants to do

mov.l @(r4,r6),r0

But it can't because the reg+reg address mode has a R0 constraint
itself.  So it needs to be changed to

mov   r4,r0
mov.l @(r0,r6),r0

And it can't handle that.  Or only sometimes?  Don't remember.


>   Is it just RA messing things
> up, unrelated to the new pass?
> 

Yep, I think so.  The additional pass seems to create "tougher" code so
reload passes out earlier than usual.  We've had the same issue when
trying address mode selection optimization.  In fact that was one huge
showstopper.

Cheers,
Oleg



Re: Add a new combine pass

2019-12-03 Thread Oleg Endo
On Mon, 2019-11-25 at 16:47 -0600, Segher Boessenkool wrote:
> 
> > > - sh (that's sh4-linux):
> > > 
> > > /home/segher/src/kernel/net/ipv4/af_inet.c: In function 
> > > 'snmp_get_cpu_field':
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to find 
> > > a register to spill in class 'R0_REGS'
> > >  1638 | }
> > >   | ^
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the 
> > > insn:
> > > (insn 18 17 19 2 (set (reg:SI 0 r0)
> > > (mem:SI (plus:SI (reg:SI 4 r4 [178])
> > > (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> > > "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> > >  (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> > > (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> > > (nil
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier 
> > > errors, bailing out
> > 
> > Would have to look more at this one.  Seems odd that it can't allocate
> > R0 when it's already the destination and when R0 can't be live before
> > the insn.  But there again, this is reload, so my enthuasiasm for looking
> > is a bit limited :-)
> 
> It wants to use r0 in some other insn, so it needs to spill it here, but
> cannot.  This is what class_likely_spilled is for.
> 

Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
explicitly, but it's got a R0_REGS class with only one said reg in it. 
So the default impl of class_likely_spilled should do its thing.

LRA is available on SH and often fixes the R0 problems -- but not
always.  Maybe it got better over time, haven't checked.

Could you re-run the SH build tests with -mlra, please ?

Cheers,
Oleg




Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-23 Thread Oleg Endo
On Sat, 2019-11-23 at 10:36 -0700, Jeff Law wrote:
> 
> > Any news on this? I would be in favor of merging these patches as I
> > have
> > tested them successfully on Debian by building the gcc-snapshot
> > package
> > with the patches applied. I used all four patches plus the
> > additional one
> > required to be able to build the kernel.
> 
> Not really.  I've already indicated to Bernd that he should go ahead
> and
> commit the changes and we can iterate on any problems that arise.
> 
> > 
> > I did not see the bootstrap problems that Mikael reported and
> > Andreas has
> > reported that the issues is not reliably reproducible on Aranym. He
> > suspected
> > that it might be a bug in the MMU emulation in Aranym which only
> > triggers
> > randomly depending on the current memory layout.
> 
> Good to know it wasn't reproducible and might ultimately be a bug in
> aranym.
> 

Meanwhile, another patch that's supposed to do the same thing was
posted:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02131.html

Cheers,
Oleg



Re: [PATCH 0/4][MSP430] Tweaks to default configuration to reduce code size

2019-11-08 Thread Oleg Endo
On Fri, 2019-11-08 at 13:27 +, Jozef Lawrynowicz wrote:
> 
> Yes, I should have used -flto in my examples. But it doesn't help remove these
> CRT library functions which are normally either directly added to the
> list of functions to run before main (via .init, .ctors or .init_array) or 
> used
> in functions which are themselves added to this list.
> 
> The unnecessary functions we want to remove are:
>   deregister_tm_clones
>   register_tm_clones
>   __do_global_dtors_aux
>   frame_dummy
> LTO can't remove any of them.
> 

Ah, right, good point.  That's not MSP430 specific actually.  For those
things I usually have custom init code, which also does other things
occasionally.  Stripping off global dtors is then an option in the
build system which takes care of it (in my case, I do it by modifying
the generated linker script).

But again, as with the exceptions, it might be better to implement
these kind of things outside of the compiler, e.g. by building the app
with -nostartfiles -nodefaultlibs and providing your own substitutes.

Another option is to patch those things in using the OS part of the
target triplet.

Cheers,
Oleg



Re: [PATCH 0/4][MSP430] Tweaks to default configuration to reduce code size

2019-11-08 Thread Oleg Endo
On Thu, 2019-11-07 at 21:31 +, Jozef Lawrynowicz wrote:
> When building small programs for MSP430, the impact of the unused
> functions pulled in from the CRT libraries is quite noticeable. Most of these
> relates to feature that will never be used for MSP430 (Transactional memory,
> supporting shared objects and dynamic linking), or rarely used (exception
> handling).

There's a magic switch, which does the business, at least for me, most
of the time:

   -flto

If you're trying to bring down the executable size as much as possible,
but don't use -flto, I think something is wrong.

Cheers,
Oleg



Re: [PATCH 2/4] MSP430: Disable exception handling by default for C++

2019-11-07 Thread Oleg Endo
On Thu, 2019-11-07 at 21:37 +, Jozef Lawrynowicz wrote:
> The code size bloat added by building C++ programs using libraries containing
> support for exceptions is significant. When using simple constructs such as
> static variables, sometimes many kB from the libraries are unnecessarily
> pulled in.
> 
> So this patch disable exceptions by default for MSP430 when compiling for C++,
> by implicitly passing -fno-exceptions unless -fexceptions is passed.

It is extremely annoying when GCC's default standard behavior differs
across different targets.  And as a consequence, you have to add a load
of workarounds and disable other things, like fiddling with the
testsuite.  It's the same thing as setting "double = float" to get more
"speed" by default.

I would strongly advice against making such non-standard behaviors the
default in the vanilla compiler.  C++ normally has exceptions enabled. 
If a user doesn't want them and is willing to deal with it all the
consequences, then we already have a mechanism to do that:
 --fno-exceptions

Perhaps it's generally more useful to add a global configure option for
GCC to disable exception handling by default.  Then you can provide a
turn-key toolchain to your customers as well -- just add an option to
the configure line.

Cheers,
Oleg



Re: [PATCH][RFC] C++-style iterators for FOR_EACH_IMM_USE_STMT

2019-11-03 Thread Oleg Endo
On Wed, 2019-10-30 at 10:27 +0100, Richard Biener wrote:
> 
> Hmm, not sure - I'd like to write
> 
>  for (gimple *use_stmt : imm_stmt_uses (SSAVAR))
>for (use_operand_p use_p :  from 
> above>)
>  ...
> 
> I don't see how that's possible.  It would need to be "awkward" like
> 
>  for (auto it : imm_stmt_uses (SSAVAR))
>{
>  gimple *use_stmt = *it;
>  for (use_operand_p use_p : it)
>...
>}
> 
> so the first loops iteration object are the actual iterator and you'd
> have to do extra indirection to get at the actual stmt you iterated
> to.
> 
> So I'd extend C++ (hah) to allow
> 
>   for (gimple *use_stmt : imm_stmt_uses (SSAVAR))
> for (use_operand_p use_p : auto)
>   ...
> 
> where 'auto' magically selects the next iterator object in scope
> [that matches].
> 
> ;)

Have you applied for a patent yet? :D

How about this one?

for (gimple* use_stmt : imm_stmt_uses (SSAVAR))
  for (use_operand_p use_p : imm_uses_on_stmt (*use_stmt))

... where helper function "imm_uses_on_stmt" returns a range object
that offers a begin and end function and its own iterator type.


Another concept that could be interesting are filter iterators.

We used a simplistic re-implementation (c++03) to avoid dragging in
boost when working on AMS
https://github.com/erikvarga/gcc/blob/master/gcc/filter_iterator.h

Example uses are
https://github.com/erikvarga/gcc/blob/master/gcc/ams.h#L845
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc#L3715


I think there are also some places in RTL where filter iterators could
be used, e.g. "iterate over all MEMs in an RTL" could be made to look
something like that:

  for (auto&& i : filter_rtl (my_rtl_obj, MEM_P))
   ...


Anyway, maybe it can plant some ideas.

Cheers,
Oleg



Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-11-03 Thread Oleg Endo
On Fri, 2019-10-11 at 23:27 +0900, Oleg Endo wrote:
> On Thu, 2019-10-03 at 19:34 -0600, Jeff Law wrote:
> > 
> > So probably the most interesting target for this test is v850-elf
> > as
> > it's got a reasonably well functioning simulator, hard and soft FP
> > targets, little endian, and I'm familiar with its current set of
> > failures.
> > 
> > I can confirm that your patch makes no difference in the test
> > results
> > (which includes execution results).
> > 
> > In fact, there haven't been any problems on any target in my tester
> > that
> > I can tie back to this change.
> > 
> > At this point I'd say let's go for it.
> > 
> 
> Thanks, Jeff.  I'll commit it to trunk if there are no further
> objections some time next week.
> 

I've just committed it as r277752.

Personally I'd like to install it on GCC 8 and 9 branches as well.
Any thoughts on that?

Cheers,
Oleg




Re: [PATCH][RFC] C++-style iterators for FOR_EACH_IMM_USE_STMT

2019-10-29 Thread Oleg Endo
On Tue, 2019-10-29 at 11:26 +0100, Richard Biener wrote:
> While I converted other iterators requiring special BREAK_FROM_XYZ
> a few years ago FOR_EACH_IMM_USE_STMT is remaining.  I've pondered
> a bit but cannot arrive at a "nice" solution here with just one
> iterator as the macros happen to use.  For reference, the macro use
> is
> 
>   imm_use_iterator iter;
>   gimple *use_stmt;
>   FOR_EACH_IMM_USE_STMT (use_stmt, iter, name)
> {
>   use_operand_p use_p;
>   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> ;
> }
> 
> which expands to (w/o macros)
> 
>imm_use_iterator iter; 
>for (gimple *use_stmt = first_imm_use_stmt (, name);
> !end_imm_use_stmt_p ();
> use_stmt = nest_imm_use_stmt ())
>  for (use_operand_p use_p = first_imm_use_on_stmt ();
>   !end_imm_use_on_stmt_p ();
>   use_p = next_imm_use_on_stmt ())
>;
> 
> and my foolish C++ attempt results in
> 
>  for (imm_use_stmt_iter it = SSAVAR; !it.end_p (); ++it)
>for (imm_use_stmt_iter::use_on_stmt it2 = it; !it2.end_p ();
> ++it2)
>  ;
> 
> with *it providing the gimple * USE_STMT and *it2 the use_operand_p.
> The complication here is to map the two-level iteration to "the C++
> way".
> Are there any STL examples mimicing this?  Of course with C++11 we
> could
> do
> 
>   for (imm_use_stmt_iter it = SSAVAR; !it.end_p (); ++it)
> for (auto it2 = it.first_use_on_stmt (); !it2.end_p (); ++it2)
>   ;
> 
> but that's not much nicer either.

Is there a way to put it in such a way that the iterators follow
standard concepts for iterators?  It would increase chances of it
becoming nicer by utilizing range based for loops.

Cheers,
Oleg



Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-26 Thread Oleg Endo
On Sat, 2019-10-26 at 14:04 -0600, Jeff Law wrote:
> On 10/26/19 1:33 PM, Andrew Waterman wrote:
> > I don't know enough to say whether the legitimize_address hook is
> > sufficient for the intended purpose, but I am sure that RISC-V's
> > concerns are not unique: other GCC targets have to cope with
> > offset-size constraints that are coupled to register-allocation
> > constraints.
> 
> Yup.  I think every risc port in the 90s faces this problem.  I
> always
> wished for a generic mechanism for ports to handle this problem.
> 
> Regardless, it's probably worth investigating.
> 

What we tried to do with the address mode selection (AMS) optimization
some time ago was the following:

  - Extract memory accesses from the insns stream and put them in 
"access sequences".  Also analyze the address expression and try   
to find effective base addresses by tracing back address 
calculations.

  - For each memory access, get a set of address mode alternatives and 
the corresponding costs from the backend.  The full context of each
access is provided, so the backend can detect things like 
"in this access sequence, FP loads dominate" and use this 
information to tune the alternative costs.

  - Given the alternatives and costs for each memory access, the pass 
would then try to minimize the costs of the whole memory access
sequence, taking costs of address modification isnns into account. 

I think this is quite generic, but of course also complex.  The
optimization problem itself is hard.  There was some research done by
others using  CPLEX or PBQP solvers.  To keep things simple we used a 
backtracking algorithm and handled only a limited set of scenarios. 
For example, I think we could not get loop constructs work nicely to
improve post-inc address mode utilization.

The SH ISA has very similar properties to ARM thumb and RVC, and
perhaps others.  Advantages would not be limited to RISC only, even
CISC ISAs like M68K, RX, ... can benefit from it, as the "proper
optimization" can reduce the instruction sizes by shortening the
addresses in the instruction stream.

If anyone is interested, here is the AMS optimization pass class:

https://github.com/erikvarga/gcc/blob/master/gcc/ams.h
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc

It's using a different style to callback into the backend code.  Not
GCC's "hooks" but a delegate pattern.  SH backend's delegate
implementation is here

https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11897

We were getting some improvements in the generated code, but it started
putting pressure on register allocation related issues in the SH
backend (the R0 problem), so we could not do more proper testing.

Maybe somebody can get some ideas out of it.

Cheers,
Oleg



Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-26 Thread Oleg Endo
On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote:
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be
> > compressed by
> > replacing a load/store of 'base register + large offset' with a new
> > load/store
> > of 'new base + small offset'. If the new base gets stored in a
> > compressed
> > register, then the new load/store can be compressed. Since there is
> > an overhead
> > in creating the new base, this change is only attempted when 'base
> > register' is
> > referenced in at least 4 load/stores in a basic block.
> > 
> > The optimization is implemented in a new RISC-V specific pass
> > called
> > shorten_memrefs which is enabled for RVC targets. It has been
> > developed for the
> > 32-bit lw/sw instructions but could also be extended to 64-bit
> > ld/sd in future.
> > 
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc,
> > rv64imac,
> > rv64imafdc via QEMU. No regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for
> > riscv.
> > * config/riscv/riscv-passes.def: New file.
> > * config/riscv/riscv-protos.h (make_pass_shorten_memrefs):
> > Declare.
> > * config/riscv/riscv-shorten-memrefs.c: New file.
> > * config/riscv/riscv.c (tree-pass.h): New include.
> > (riscv_compressed_reg_p): New Function
> > (riscv_compressed_lw_offset_p): Likewise.
> > (riscv_compressed_lw_address_p): Likewise.
> > (riscv_shorten_lw_offset): Likewise.
> > (riscv_legitimize_address): Attempt to convert base +
> > large_offset
> > to compressible new_base + small_offset.
> > (riscv_address_cost): Make anticipated compressed load/stores
> > cheaper for code size than uncompressed load/stores.
> > (riscv_register_priority): Move compressed register check to
> > riscv_compressed_reg_p.
> > * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET):
> > Define.
> > * config/riscv/riscv.opt (mshorten-memefs): New option.
> > * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> > (PASSES_EXTRA): Add riscv-passes.def.
> > * doc/invoke.texi: Document -mshorten-memrefs.
> 
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
> 
> The hook, IIRC, is called out explow.c.
> 

This sounds like some of my addressing mode selection (AMS) attempts on
SH.  Haven't looked at the patch (sorry), but I'm sure the problem is
pretty much the same.

On SH legitimize_address is used to do ... "something" ... to the
address in order to make the displacement fit.  The issue is,
legitimize_address doesn't get any context so it can't even try to find
a local optimal base address or something like that.

Cheers,
Oleg



Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-11 Thread Oleg Endo
On Thu, 2019-10-03 at 19:34 -0600, Jeff Law wrote:
> 
> So probably the most interesting target for this test is v850-elf as
> it's got a reasonably well functioning simulator, hard and soft FP
> targets, little endian, and I'm familiar with its current set of
> failures.
> 
> I can confirm that your patch makes no difference in the test results
> (which includes execution results).
> 
> In fact, there haven't been any problems on any target in my tester
> that
> I can tie back to this change.
> 
> At this point I'd say let's go for it.
> 

Thanks, Jeff.  I'll commit it to trunk if there are no further
objections some time next week.

It's a latent issue, not a regression.  OK for the open branches, too? 
Any opinions on that?

Cheers,
Oleg



Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-11 Thread Oleg Endo
Hi Bernd,

On Sun, 2019-09-29 at 08:49 +, Bernd Edlinger wrote:
> 
> But I cannot tell if the bitfield access generates
> more efficient code or identical code than the
> original variant when no ms bitfields are used.
> That needs closer inspection of the generated
> assembler code, a simple bootstrap / regtest will
> probably not be sufficient.

Thanks for your comment.

The bitfields in this case are used for packing and unpacking the FP
values.  So I gave it a try on RX and SH as examples (with trunk
compiler).  I've extracted the unpack function for a "double" value and
compared the generated assembly code.  Please see the attached file.

The bitfield code on RX is equivalent.  On SH bitfields is slightly
worse.  It gets *a lot* worse when the struct is passed by
reference/pointer, then things get completely out of control.

Theoretically, if the bitfields and the shifts-and-ands extract/pack
the same bits from the underlying base integer value, the resulting
code should be the same in both cases (see RX).  But it depends on what
the backend does, or tries to do (see SH).

> But my thought is if the -mms-bitfields option has such
> an impact on this structure, then it would be good if there
> was a built-in define that can be used to adjust to and/or
> diagnose the problem at compile time.

I think it'd be better to add static_assert on the expected sizes of
the various float value structs.


> 
> I think that is missing right now, but wouldn't it be nice to have
> a define like __MS_BITFIELD_LAYOUT__ ?

Honestly, I'm not a big fan of the whole MS bitfield thing.  On RX it's
been put there long time ago for compatibility with some other
compiler.  I'm not sure if it's relevant at all anymore.  So I don't
want to add any more to the pile ...

Cheers,
Oleg

typedef unsigned long long fractype;

struct unpacked_double
{
  fractype fraction;
  int exp;
  int sign;
};

struct  __attribute__ ((packed))
bits_little_endian 
{
  fractype fraction:52 __attribute__ ((packed));
  unsigned int exp:11 __attribute__ ((packed));
  unsigned int sign:1 __attribute__ ((packed));
};

static_assert (sizeof (bits_little_endian) == 8, "");
static_assert (sizeof (long long) == 8, "");


// 

unpacked_double
unpack_d_bitfields (bits_little_endian val)
{
  fractype fraction = val.fraction;
  int exp = val.exp;
  int sign = val.sign;

  return { fraction, exp, sign };
}

/*
RX -O2

__Z18unpack_d_bitfieldsRK18bits_little_endian:
	mov.L	r2, r4	 ; 4	[c=4 l=2]  *movsi_internal/4
	shlr	#20, r2, r3	 ; 9	[c=8 l=3]  lshrsi3/2
	add	#-16, r0	 ; 58	[c=4 l=3]  addsi3_internal/3
	mov.L	#0xf, r2	 ; 57	[c=4 l=5]  *movsi_internal/2
	and	r4, r2	 ; 43	[c=4 l=2]  andsi3/0
	and	#0x7ff, r3	 ; 44	[c=4 l=4]  andsi3/3
	shlr	#31, r4	 ; 45	[c=8 l=2]  lshrsi3/1
	rtsd	#16		 ; 61	[c=4 l=2]  deallocate_and_return

SH -O2

__Z18unpack_d_bitfieldsRK18bits_little_endian:
.LFB0:
	.cfi_startproc
	.cfi_startproc
	add	#-8,r15	! 85	[c=4 l=2]  *addsi3/0
	.cfi_def_cfa_offset 8
	mov	r2,r0	! 2	[c=4 l=2]  movsi_ie/1
	mov.l	.L3,r3	! 32	[c=10 l=2]  movsi_ie/0
	mov	#-21,r2	! 82	[c=4 l=2]  movsi_ie/2
	mov	r5,r1	! 81	[c=4 l=2]  movsi_ie/1
	add	r1,r1	! 16	[c=4 l=2]  ashlsi3_k/0
	shld	r2,r1	! 71	[c=4 l=2]  lshrsi3_d
	mov	r5,r2	! 83	[c=4 l=2]  movsi_ie/1
	shll	r2		! 72	[c=0 l=2]  shll
	mov.l	r4,@r0	! 44	[c=4 l=2]  movsi_ie/8
	movt	r2		! 73	[c=4 l=2]  movt
	mov.l	r1,@(8,r0)	! 35	[c=4 l=2]  movsi_ie/8
	and	r3,r5	! 33	[c=4 l=2]  *andsi_compact/3
	mov.l	r2,@(12,r0)	! 36	[c=4 l=2]  movsi_ie/8
	mov.l	r5,@(4,r0)	! 45	[c=4 l=2]  movsi_ie/8
	rts			! 91	[c=0 l=2]  *return_i
	add	#8,r15	! 90	[c=4 l=2]  *addsi3/0
	.cfi_endproc

*/

// 

unpacked_double
unpack_d_no_bitfields (long long val)
{
  fractype fraction = val & fractype)1) << 52) - 1);
  int exp = ((int)(val >> 52)) & ((1 << 11) - 1);
  int sign = ((int)(val >> (52 + 11))) & 1;

  return { fraction, exp, sign };
}

/*
RX -O2
	mov.L	r2, r4	 ; 4	[c=4 l=2]  *movsi_internal/4
	shar	#20, r2, r3	 ; 13	[c=8 l=3]  ashrsi3/2
	add	#-16, r0	 ; 56	[c=4 l=3]  addsi3_internal/3
	mov.L	#0xf, r2	 ; 55	[c=4 l=5]  *movsi_internal/2
	and	r4, r2	 ; 42	[c=4 l=2]  andsi3/0
	and	#0x7ff, r3	 ; 43	[c=4 l=4]  andsi3/3
	shlr	#31, r4	 ; 44	[c=8 l=2]  lshrsi3/1
	rtsd	#16		 ; 59	[c=4 l=2]  deallocate_and_return

SH -O2
	mov	r2,r0	! 2	[c=4 l=2]  movsi_ie/1
	mov.l	.L6,r1	! 10	[c=10 l=2]  movsi_ie/0
	mov.l	r4,@r2	! 9	[c=4 l=2]  movsi_ie/8
	and	r5,r1	! 11	[c=4 l=2]  *andsi_compact/3
	mov.l	r1,@(4,r2)	! 12	[c=4 l=2]  movsi_ie/8
	mov	#-21,r2	! 69	[c=4 l=2]  movsi_ie/2
	mov	r5,r1	! 68	[c=4 l=2]  movsi_ie/1
	shll	r5		! 59	[c=0 l=2]  shll
	add	r1,r1	! 18	[c=4 l=2]  ashlsi3_k/0
	shld	r2,r1	! 58	[c=4 l=2]  lshrsi3_d
	movt	r5		! 60	[c=4 l=2]  movt
	mov.l	r1,@(8,r0)	! 20	[c=4 l=2]  movsi_ie/8
	rts			! 74	[c=0 l=2]  *return_i
	mov.l	r5,@(12,r0)	! 28	[c=4 l=2]  movsi_ie/8
.L7:
	.align 2
.L6:
	.long	1048575

*/



Re: [SH][committed] Fix PR 88630

2019-10-11 Thread Oleg Endo
On Fri, 2019-10-11 at 00:36 +0900, Oleg Endo wrote:
> Hi,
> 
> When we did the refactoring of SH's FPSCR handling back in GCC 5, we
> missed one thing regarding ST-40, it seems.
> 
> The attached patch fixes the issue as confirmed on the real
> hardware. 
> Also tested on sh-sim with
> make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-
> m2a/-mb,-m4/-ml,-m4/-mb}"
> 
> 
> Committed to trunk, GCC 9, GCC 8 as r276809, r276825, r276837.

I forgot about GCC 7.  Committed there as well as r276877.

Cheers,
Oleg



[SH][committed] Fix PR 88630

2019-10-10 Thread Oleg Endo
Hi,

When we did the refactoring of SH's FPSCR handling back in GCC 5, we
missed one thing regarding ST-40, it seems.

The attached patch fixes the issue as confirmed on the real hardware. 
Also tested on sh-sim with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-
m2a/-mb,-m4/-ml,-m4/-mb}"


Committed to trunk, GCC 9, GCC 8 as r276809, r276825, r276837.

Cheers,
Oleg

gcc/ChangeLog:
PR target/88630
* config/sh/sh.h (TARGET_FPU_SH4_300): New macro.
* config/sh/sh.c (sh_option_override): Enable fsca and fsrra insns
also for TARGET_FPU_SH4_300.
(sh_emit_mode_set): Check for TARGET_FPU_SH4_300 instead of
TARGET_SH4_300.
* config/sh/sh.md (toggle_pr): Add TARGET_FPU_SH4_300 condition.
(negsf2): Expand to either negsf2_fpscr or negsf2_no_fpscr.
(*negsf2_i): Split into ...
(negsf2_fpscr, negsf2_no_fpscr): ... these new patterns.
(abssf2): Expand to either abssf2_fpsc or abssf2_no_fpsc.
(**abssf2_i): Split into ...
(abssf2_fpscr, abssf2_no_fpscr): ... these new patterns.
(negdf2): Expand to either negdf2_fpscr or negdf2_no_fpscr.
(*negdf2_i): Split into ...
(negdf2_fpscr, negdf2_no_fpscr): ... these new patterns.
(absdf2): Expand to either absdf2_fpscr or absdf2_no_fpsc.
(**abssf2_i): Split into ...
(absdf2_fpscr, absdf2_no_fpscr): ... these new patterns.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276411)
+++ gcc/config/sh/sh.c	(working copy)
@@ -958,11 +958,13 @@
   if (flag_unsafe_math_optimizations)
 {
   /* Enable fsca insn for SH4A if not otherwise specified by the user.  */
-  if (global_options_set.x_TARGET_FSCA == 0 && TARGET_SH4A_FP)
+  if (global_options_set.x_TARGET_FSCA == 0
+	  && (TARGET_SH4A_FP || TARGET_FPU_SH4_300))
 	TARGET_FSCA = 1;
 
   /* Enable fsrra insn for SH4A if not otherwise specified by the user.  */
-  if (global_options_set.x_TARGET_FSRRA == 0 && TARGET_SH4A_FP)
+  if (global_options_set.x_TARGET_FSRRA == 0
+	  && (TARGET_SH4A_FP || TARGET_FPU_SH4_300))
 	TARGET_FSRRA = 1;
 }
 
@@ -12490,7 +12492,7 @@
 sh_emit_mode_set (int entity ATTRIBUTE_UNUSED, int mode,
 		  int prev_mode, HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
 {
-  if ((TARGET_SH4A_FP || TARGET_SH4_300)
+  if ((TARGET_SH4A_FP || TARGET_FPU_SH4_300)
   && prev_mode != FP_MODE_NONE && prev_mode != mode)
 {
   emit_insn (gen_toggle_pr ());
Index: gcc/config/sh/sh.h
===
--- gcc/config/sh/sh.h	(revision 276410)
+++ gcc/config/sh/sh.h	(working copy)
@@ -69,6 +69,8 @@
FPU is disabled (which makes it compatible with SH4al-dsp).  */
 #define TARGET_SH4A_FP (TARGET_SH4A && TARGET_FPU_ANY)
 
+/* True if the FPU is a SH4-300 variant.  */
+#define TARGET_FPU_SH4_300 (TARGET_FPU_ANY && TARGET_SH4_300)
 
 /* This is not used by the SH2E calling convention  */
 #define TARGET_VARARGS_PRETEND_ARGS(FUN_DECL) \
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 276410)
+++ gcc/config/sh/sh.md	(working copy)
@@ -9163,7 +9163,7 @@
 	(xor:SI (reg:SI FPSCR_REG) (const_int FPSCR_PR)))
(set (reg:SI FPSCR_MODES_REG)
 	(unspec_volatile:SI [(const_int 0)] UNSPECV_FPSCR_MODES))]
-  "TARGET_SH4A_FP"
+  "TARGET_SH4A_FP || TARGET_FPU_SH4_300"
   "fpchg"
   [(set_attr "type" "fpscr_toggle")])
 
@@ -9391,15 +9391,31 @@
 (define_expand "negsf2"
   [(set (match_operand:SF 0 "fp_arith_reg_operand")
 	(neg:SF (match_operand:SF 1 "fp_arith_reg_operand")))]
-  "TARGET_SH2E")
+  "TARGET_FPU_ANY"
+{
+  if (TARGET_FPU_SH4_300)
+emit_insn (gen_negsf2_fpscr (operands[0], operands[1]));
+  else
+emit_insn (gen_negsf2_no_fpscr (operands[0], operands[1]));
+  DONE;
+})
 
-(define_insn "*negsf2_i"
+(define_insn "negsf2_no_fpscr"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(neg:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")))]
-  "TARGET_SH2E"
+  "TARGET_FPU_ANY && !TARGET_FPU_SH4_300"
   "fneg	%0"
   [(set_attr "type" "fmove")])
 
+(define_insn "negsf2_fpscr"
+  [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
+	(neg:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")))
+   (use (reg:SI FPSCR_MODES_REG))]
+  "TARGET_FPU_SH4_300"
+  "fneg	%0"
+  [(set_attr "type" "fmove")
+   (set_attr "fp_mode" "single")])
+
 (define_expand "sqrtsf2"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
 	(sqrt:SF (match_operand:SF 1 "fp_arith_reg_operand" "")))]
@@ -9489,15 +9505,31 @@
 (define_expand "abssf2"
   [(set (match_operand:SF 0 "fp_arith_reg_operand")
 	(abs:SF (match_operand:SF 1 "fp_arith_reg_operand")))]
-  "TARGET_SH2E")
+  "TARGET_FPU_ANY"
+{
+  if (TARGET_FPU_SH4_300)
+emit_insn (gen_abssf2_fpscr (operands[0], operands[1]));
+  else
+emit_insn (gen_abssf2_no_fpscr (operands[0], 

Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-02 Thread Oleg Endo
On Tue, 2019-10-01 at 14:21 -0600, Jeff Law wrote:
> 
> So the ask is to just test this on some LE targets?  I can do that :-)
> 
> I'll throw it in.  Analysis will be slightly more difficult than
> usual
> as we've got some fallout from Richard S's work, but it's certainly
> do-able.

Thanks a lot!


> ps.  ANd yes, I've got a request to the build farm folks to get a
> jenkins instance on the build farm.  Once that's in place I can have
> my tester start publishing results that everyone can see.

Sounds great.  Would it be possible for other people to give the auto
tester patches for testing and get the results back from it?  Or
something like that?

Cheers,
Oleg



[SH][committed] Fix PR 88562

2019-10-01 Thread Oleg Endo
Hi,

The attached patch fixes PR 88562.

Tested on trunk with
   make -k check 
RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb}"

Committed to trunk, GCC 9, GCC 8, GCC 7 as r276411, r276412, r276413, r276414.

Cheers,
Oleg


gcc/ChangeLog:
PR target/88562
* config/sh/sh.c (sh_extending_set_of_reg::use_as_extended_reg): Use
sh_check_add_incdec_notes to preserve REG_INC notes when replacing
a memory access insn.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276264)
+++ gcc/config/sh/sh.c	(working copy)
@@ -12068,9 +12068,11 @@
 	rtx r = gen_reg_rtx (SImode);
 	rtx_insn* i0;
 	if (from_mode == QImode)
-	  i0 = emit_insn_after (gen_extendqisi2 (r, set_src), insn);
+	  i0 = sh_check_add_incdec_notes (
+			emit_insn_after (gen_extendqisi2 (r, set_src), insn));
 	else if (from_mode == HImode)
-	  i0 = emit_insn_after (gen_extendhisi2 (r, set_src), insn);
+	  i0 = sh_check_add_incdec_notes (
+			emit_insn_after (gen_extendhisi2 (r, set_src), insn));
 	else
 	  gcc_unreachable ();
 


[RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-09-28 Thread Oleg Endo
Hi,

I've been dragging this patch along with me for a while.
At the moment, I don't have the resources to fully test it as requested
by Ian in the PR discussion.

So I would like to ask for general comments on this one and hope that
folks with bigger automated test setups can run the patch through their
machinery for little endian targets.


Summary of the story:

I've noticed this issue on the RX on GCC 6, but it seems it's been
there forever.

On RX, fp-bit is used for software floating point emulation.  The RX
target also uses "MS bit-field" layout by default.  This means that
code like

struct
{
  fractype fraction:FRACBITS __attribute__ ((packed));
  unsigned int exp:EXPBITS __attribute__ ((packed));
  unsigned int sign:1 __attribute__ ((packed));
} bits;

will result in sizeof (bits) != 8

For some reason, this bit-field style declaration is used only for
FLOAT_BIT_ORDER_MISMATCH, which generally seems to be set for little
endian targets.  In other cases (i.e. big endian) open coded bit field
extraction and packing is used on the base integer type, like

 fraction = src->value_raw & fractype)1) << FRACBITS) - 1);
 exp = ((int)(src->value_raw >> FRACBITS)) & ((1 << EXPBITS) - 1);
 sign = ((int)(src->value_raw >> (FRACBITS + EXPBITS))) & 1;

This works of course regardless of the bit-field packing layout of the
target.

Joseph suggested to pack the struct bit, which would fix the issue.  
https://gcc.gnu.org/ml/gcc-bugs/2017-08/msg01651.html

However, I would like to propose to remove the special case of
FLOAT_BIT_ORDER_MISMATCH altogether as in the attached patch.

Any comments?

Cheers,
Oleg



libgcc/ChangeLog

PR libgcc/77804
* fp-bit.h: Remove FLOAT_BIT_ORDER_MISMATCH.
* fp-bit.c (pack_d, unpack_d): Remove special cases for 
FLOAT_BIT_ORDER_MISMATCH.
* config/arc/t-arc: Remove FLOAT_BIT_ORDER_MISMATCH.
Index: libgcc/config/arc/t-arc
===
--- libgcc/config/arc/t-arc	(revision 251045)
+++ libgcc/config/arc/t-arc	(working copy)
@@ -46,7 +46,6 @@
 
 dp-bit.c: $(srcdir)/fp-bit.c
 	echo '#ifndef __big_endian__' > dp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >> dp-bit.c
 	echo '#endif' >> dp-bit.c
 	echo '#include "fp-bit.h"' >> dp-bit.c
 	echo '#include "config/arc/dp-hack.h"' >> dp-bit.c
@@ -55,7 +54,6 @@
 fp-bit.c: $(srcdir)/fp-bit.c
 	echo '#define FLOAT' > fp-bit.c
 	echo '#ifndef __big_endian__' >> fp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >> fp-bit.c
 	echo '#endif' >> fp-bit.c
 	echo '#include "config/arc/fp-hack.h"' >> fp-bit.c
 	cat $(srcdir)/fp-bit.c >> fp-bit.c
Index: libgcc/fp-bit.c
===
--- libgcc/fp-bit.c	(revision 251045)
+++ libgcc/fp-bit.c	(working copy)
@@ -316,12 +316,7 @@
   /* We previously used bitfields to store the number, but this doesn't
  handle little/big endian systems conveniently, so use shifts and
  masks */
-#ifdef FLOAT_BIT_ORDER_MISMATCH
-  dst.bits.fraction = fraction;
-  dst.bits.exp = exp;
-  dst.bits.sign = sign;
-#else
-# if defined TFLOAT && defined HALFFRACBITS
+#if defined TFLOAT && defined HALFFRACBITS
  {
halffractype high, low, unity;
int lowsign, lowexp;
@@ -394,11 +389,10 @@
  }
dst.value_raw = ((fractype) high << HALFSHIFT) | low;
  }
-# else
+#else
   dst.value_raw = fraction & fractype)1) << FRACBITS) - (fractype)1);
   dst.value_raw |= ((fractype) (exp & ((1 << EXPBITS) - 1))) << FRACBITS;
   dst.value_raw |= ((fractype) (sign & 1)) << (FRACBITS | EXPBITS);
-# endif
 #endif
 
 #if defined(FLOAT_WORD_ORDER_MISMATCH) && !defined(FLOAT)
@@ -450,12 +444,7 @@
   src = 
 #endif
   
-#ifdef FLOAT_BIT_ORDER_MISMATCH
-  fraction = src->bits.fraction;
-  exp = src->bits.exp;
-  sign = src->bits.sign;
-#else
-# if defined TFLOAT && defined HALFFRACBITS
+#if defined TFLOAT && defined HALFFRACBITS
  {
halffractype high, low;

@@ -498,11 +487,10 @@
 	 }
  }
  }
-# else
+#else
   fraction = src->value_raw & fractype)1) << FRACBITS) - 1);
   exp = ((int)(src->value_raw >> FRACBITS)) & ((1 << EXPBITS) - 1);
   sign = ((int)(src->value_raw >> (FRACBITS + EXPBITS))) & 1;
-# endif
 #endif
 
   dst->sign = sign;
Index: libgcc/fp-bit.h
===
--- libgcc/fp-bit.h	(revision 251045)
+++ libgcc/fp-bit.h	(working copy)
@@ -128,10 +128,6 @@
 #define NO_DI_MODE
 #endif
 
-#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
-#define FLOAT_BIT_ORDER_MISMATCH
-#endif
-
 #if __BYTE_ORDER__ != __FLOAT_WORD_ORDER__
 #define FLOAT_WORD_ORDER_MISMATCH
 #endif
@@ -354,16 +350,6 @@
 # endif
 #endif
 
-#ifdef FLOAT_BIT_ORDER_MISMATCH
-  struct
-{
-  fractype fraction:FRACBITS __attribute__ ((packed));
-  unsigned int exp:EXPBITS __attribute__ ((packed));
-  unsigned int sign:1 __attribute__ ((packed));
-}
-  bits;
-#endif
-
 #ifdef _DEBUG_BITFLOAT
   struct
 {


Re: [PATCH v2] libitm: sh: avoid absolute relocation in shared library (PR 86712)

2019-09-28 Thread Oleg Endo
On Sat, 2018-08-04 at 18:00 +0900, Oleg Endo wrote:
> On Fri, 2018-08-03 at 14:54 -0600, Jeff Law wrote:
> > On 07/28/2018 07:04 AM, slyfox.inbox.ru via gcc-patches wrote:
> > > 
> > > From: Sergei Trofimovich 
> > > 
> > > Cc: Andreas Schwab 
> > > Cc: Torvald Riegel 
> > > Cc: Alexandre Oliva 
> > > Cc: Oleg Endo 
> > > Cc: Kaz Kojima 
> > > Signed-off-by: Sergei Trofimovich 
> > > ---
> > >  libitm/config/sh/sjlj.S | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/libitm/config/sh/sjlj.S b/libitm/config/sh/sjlj.S
> > > index 043f36749be..f265ab8f898 100644
> > > --- a/libitm/config/sh/sjlj.S
> > > +++ b/libitm/config/sh/sjlj.S
> > > @@ -53,7 +53,7 @@ _ITM_beginTransaction:
> > >  #else
> > >   cfi_def_cfa_offset (4*10)
> > >  #endif
> > > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > > +#if !defined __PIC__
> > >   mov.l   .Lbegin, r1
> > >   jsr @r1
> > >movr15, r5
> > > @@ -78,7 +78,7 @@ _ITM_beginTransaction:
> > >  
> > >   .align  2
> > >  .Lbegin:
> > > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > > +#if !defined __PIC__
> > >   .long   GTM_begin_transaction
> > >  #else
> > >   .long   GTM_begin_transaction@PCREL-(.Lbegin0-.)
> > > 
> > 
> > THanks.  I installed this version.
> > 
> 
> Thanks Jeff.
> If there are no objections, I'll backport it to the 7 and 8 branches.
> 
> Cheers,
> Oleg


Finally  committed to GCC 8 as r276246 and to GCC 7 as r276247.

Cheers,
Oleg




[SH][committed] Fix PR 86805

2019-09-28 Thread Oleg Endo
Hi,

This also sets TARGET_HAVE_SPECULATION_SAFE_VALUE to
speculation_safe_value_not_needed for SH.

Tested with "make all-gcc".

Committed on trunk as r276244 and on GCC 9 as r276245.

Cheers,
Oleg

gcc/ChangeLog

PR target/86805
* config/sh/sh.c (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276243)
+++ gcc/config/sh/sh.c	(working copy)
@@ -661,6 +661,9 @@
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT constant_alignment_word_strings
 
+#undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
+#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 


[SH][committed] Fix PR 80672

2019-09-28 Thread Oleg Endo
Hi,

The attached patch fixes PR 80672.
Tested by building the compiler with "make all-gcc" and manually
invoking it and checking that the option is parsed as expected.

Committed to trunk as r276240, GCC 9 as r276241, GCC 8 as r276242, GCC
7 as r276243.

Cheers,
Oleg


gcc/ChangeLog
PR target/80672
* config/sh/sh.c (parse_validate_atomic_model_option): Use
std::string::compare instead of std::string::find.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276235)
+++ gcc/config/sh/sh.c	(working copy)
@@ -734,7 +734,7 @@
 {
   if (tokens[i] == "strict")
 	ret.strict = true;
-  else if (tokens[i].find ("gbr-offset=") == 0)
+  else if (!tokens[i].compare (0, strlen ("gbr-offset="), "gbr-offset="))
 	{
 	  std::string offset_str = tokens[i].substr (strlen ("gbr-offset="));
 	  ret.tcb_gbr_offset = integral_argument (offset_str.c_str ());


Re: [committed] [PR target/85993] Remove dead conditional in SH target code

2019-09-28 Thread Oleg Endo
On Sun, 2018-07-15 at 14:30 -0600, Jeff Law wrote:
> 
> Per Oleg's comment in the PR, the second block is dead and should be
> removed...
> 
> Committing to the trunk.   While I'm confident this won't change
> anything, my tester will bootstrap sh4 & sh4eb overnight for
> additional
> verification.
> 

Probably irrelevant, but just for the record ...

I've just backported this patch to GCC 7 and GCC 8 branches as r276237
and r276239.

Tested briefly with "make all-gcc".

Cheers,
Oleg



Re: [1/9] Simplify the implementation of HARD_REG_SET

2019-09-10 Thread Oleg Endo
On Mon, 2019-09-09 at 19:05 +0100, Richard Sandiford wrote:
> 
> Yeah.  I might come back to this later and look at a fuller
> transition
> to C++ (or at least to try to get rid of CLEAR_HARD_REG_SET).
> 

Maybe you can just typedef it to std::bitset ;)

Cheers,
Oleg



Re: [PATCH 26/30] Changes to sh

2019-06-28 Thread Oleg Endo
On Tue, 2019-06-25 at 15:22 -0500, acsaw...@linux.ibm.com wrote:
> From: Aaron Sawdey 
> 
>   * config/sh/sh.md (movmemsi): Change name to cpymemsi.
> ---
>  gcc/config/sh/sh.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index 8354377..ed70e34 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -8906,7 +8906,7 @@
>  
>  ;; String/block move insn.
>  
> -(define_expand "movmemsi"
> +(define_expand "cpymemsi"
>[(parallel [(set (mem:BLK (match_operand:BLK 0))
>  (mem:BLK (match_operand:BLK 1)))
> (use (match_operand:SI 2 "nonmemory_operand"))

Looks like a trivial change.  It's OK.

Cheers,
Oleg



Re: [PATCH] RX: Add rx-*-linux target

2019-06-03 Thread Oleg Endo
On Sun, 2019-06-02 at 12:37 -0500, Segher Boessenkool wrote:
> 
> This is -m64bit-doubles/-m32bit-doubles, and t-rx already multilibs
> that?

Yes, it does.

> And the default is 32-bit.  So why does rx-linux need something
> different?
> You could make a point for wanting 64-bit doubles as default, even;
> disabling it completely does not seem like a good plan.
> 

On RX, "default to 32-bit" means DF=SF, always.  I.e. DF is disabled
for good.  Usually I built RX stuff always with -m64bit-doubles because
of that.  It should be actually the default and -m32bit-doubles should
be some kind of "optimization" switch.  So in my optinion, -m64bit-
doubles should be the default for a linux target, to allow having
actual doubles, and not just floats.

Cheers,
Oleg




Re: [PATCH] RX: Add rx-*-linux target

2019-06-02 Thread Oleg Endo
On Sun, 2019-06-02 at 20:26 +0900, Yoshinori Sato wrote:
> On Fri, 31 May 2019 09:16:18 +0900,
> Jeff Law wrote:
> > 
> > On 5/29/19 12:27 PM, Jeff Law wrote:
> > > On 5/23/19 6:05 AM, Yoshinori Sato wrote:
> > > > I ported linux kernel to Renesas RX.
> > > > 
> > > > rx-*-elf target output a binary different from the standard
> > > > ELF.
> > > > It has the same format as the Renesas compiler.
> > > > 
> > > > But the linux kernel requires the standard ELF format.
> > > > I want to define a rx-*-linux target so that I can generate
> > > > a standard ELF binary.
> > > 
> > > Presumably you're resubmitting after your assignment got recorded
> > > (I
> > > think I saw that fly by recently).
> > > 
> > > I'll construct a ChangeLog and install this on the trunk.
> > 
> > So this is causing libgcc to fail to build for rx-elf.  The problem
> > is
> > the DF=SF #define.  I think you need so split that out so that it's
> > only
> > used for rx-linux.
> > 
> > Jeff
> 
> OK. fix it.
> I tried build rx-elf target. it success.
> 

Setting DF=SF is the wrong thing to do IMHO.  RX can do DF just fine in
software.  If this is hardcoded like that in the roots of the
toolchain, it will make compiling packages that actually need real DF
completely impossible, won't it?  We also don't set DI = SI just
because the hardware is bad at SI ... 

Just my 2 cents.

Cheers,
Oleg



Re: Recent combine change causing regressions

2019-05-13 Thread Oleg Endo
On Mon, 2019-05-13 at 09:38 -0500, Segher Boessenkool wrote:
> On Mon, May 13, 2019 at 08:27:15AM -0600, Jeff Law wrote:
> > sh3-linux-gnu and sh3eb-linux-gnu:
> 
> I test sh2 and sh4, but not sh3 :-)
> 
> > Tests that now fail, but worked before (3 tests):
> > 
> > gcc.target/sh/pr51244-11.c scan-assembler-not subc|and|bra
> > gcc.target/sh/pr51244-11.c scan-assembler-times bf\t0f 1
> > gcc.target/sh/pr51244-11.c scan-assembler-times bt\t0f 1
> > 
> > Previously we'd match this pattern:
> > 
> > (define_insn "*cset_zero"
> >   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> > (if_then_else:SI (match_operand:SI 1 "cbranch_treg_value")
> >  (match_operand:SI 2 "arith_reg_operand"
> > "0")
> >  (const_int 0)))]
> >   "TARGET_SH1 && TARGET_ZDCBRANCH"
> > 
> > After your change we no longer try to do so.
> > 
> > I really don't care about the SH port.  But isn't this really a
> > symptom
> > of a larger problem.  Namely that by not generating if-then-else
> > you've
> > hosed every target that implements conditional moves via if-then-
> > else
> > constructs?
> 
> I tested on 30-something targets (all *-linux), and only mips64
> regressed
> a little, everything else improved.  So the current tuning is better
> than
> what it was before.  No doubt it can be improved though!
> 
> This is only if-then-else for a single bit, fwiw.
> 
> I'll build some sh3-linux if I find a cycle or two.
> 

Hmmm .. on SH3 TARGET_ZDCBRANCH should be off, afair.
What would be the alternative now for the if-then-else?

Cheers,
Oleg



Re: [PATCH v2] libitm: sh: avoid absolute relocation in shared library (PR 86712)

2018-08-04 Thread Oleg Endo
On Fri, 2018-08-03 at 14:54 -0600, Jeff Law wrote:
> On 07/28/2018 07:04 AM, slyfox.inbox.ru via gcc-patches wrote:
> > 
> > From: Sergei Trofimovich 
> > 
> > Cc: Andreas Schwab 
> > Cc: Torvald Riegel 
> > Cc: Alexandre Oliva 
> > Cc: Oleg Endo 
> > Cc: Kaz Kojima 
> > Signed-off-by: Sergei Trofimovich 
> > ---
> >  libitm/config/sh/sjlj.S | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libitm/config/sh/sjlj.S b/libitm/config/sh/sjlj.S
> > index 043f36749be..f265ab8f898 100644
> > --- a/libitm/config/sh/sjlj.S
> > +++ b/libitm/config/sh/sjlj.S
> > @@ -53,7 +53,7 @@ _ITM_beginTransaction:
> >  #else
> >     cfi_def_cfa_offset (4*10)
> >  #endif
> > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > +#if !defined __PIC__
> >     mov.l   .Lbegin, r1
> >     jsr @r1
> >      movr15, r5
> > @@ -78,7 +78,7 @@ _ITM_beginTransaction:
> >  
> >     .align  2
> >  .Lbegin:
> > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > +#if !defined __PIC__
> >     .long   GTM_begin_transaction
> >  #else
> >     .long   GTM_begin_transaction@PCREL-(.Lbegin0-.)
> > 
> THanks.  I installed this version.
> 

Thanks Jeff.
If there are no objections, I'll backport it to the 7 and 8 branches.

Cheers,
Oleg


Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-22 Thread Oleg Endo
On Thu, 2018-02-22 at 15:41 +, Nick Clifton wrote:


> > > gcc/ChangeLog:
> > >   * config/rx/rx.c (rx_rtx_costs): New function.
> > >   (TARGET_RTX_COSTS): Override to use rx_rtx_costs.
> Approved - please apply.
> 

Thanks.  Committed as r257905.

Cheers,
Oleg


Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-22 Thread Oleg Endo
Ping.

On Thu, 2018-02-15 at 23:07 +0900, Oleg Endo wrote:
> On Wed, 2018-02-14 at 01:06 +0900, Oleg Endo wrote:
> > 
> >  
> > Do you happen to have any other numbers on the resulting code
> > size/speed?  Looking at the new costs that the patch introduces,
> > I'd
> > expect there to be some more changes than just the 1/x...
> > 
> I've checked your proposed patch with the CSiBE set for code size
> changes.
> 
> With your patch, the code size of the whole set:
> sum:  2806044 -> 2801346-4698 / -0.167424 %
> 
> 
> Taking out this piece
> 
>     case IF_THEN_ELSE:
>   *total = COSTS_N_INSNS (3);
>   return true;
> 
> from the rx_rtx_costs results in:
> sum:  2806044 -> 2801099-4945 / -0.176227 %
> 
> 
> Taking out another piece 
> 
>       if (GET_CODE (XEXP (x, 0)) == MEM
>  || GET_CODE (XEXP (x, 1)) == MEM)
>    *total = COSTS_N_INSNS (3);
>   else
> 
> results in:
> sum:  2806044 -> 2800315-5729 / -0.204166 %
> 
> So I'd like to propose the attached patch instead, as it eliminates 1
> KByte of code more from the whole set.
> 
> Just in case, I'm testing it now with
>   "make -k check" on rx-sim for c and c++
> 
> OK for trunk if it passes?
> 
> Cheers,
> Oleg
> 
> gcc/ChangeLog:
>   * config/rx/rx.c (rx_rtx_costs): New function.
>   (TARGET_RTX_COSTS): Override to use rx_rtx_costs.


Re: [RX] Fix PR 83831 -- Unused bclr, bnot, bset insns

2018-02-16 Thread Oleg Endo
On Wed, 2018-02-14 at 21:34 +0900, Oleg Endo wrote:


> > Approved - please apply - and thanks very much for doing this!
> Thanks!  Committed as r257655.
> 

Testing another patch
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00903.html

revealed a bug.
I've committed the attached somewhat obvious patch as r257735.

Cheers,
Oleg

gcc/ChangeLog:
PR target/83831
* config/rx/rx.c (rx_fuse_in_memory_bitop): Convert shift operand
to QImode.

gcc/testsuite/ChangeLog:
PR target/83831
* gcc.target/rx/pr83831.c (test_3, test_6): Adjust test cases.Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 257733)
+++ gcc/config/rx/rx.c	(working copy)
@@ -3515,7 +3515,7 @@
 if (volatile_insn_p (PATTERN (i)) || CALL_P (i))
   return false;
 
-  emit_insn (gen_insn (mem, operands[1]));
+  emit_insn (gen_insn (mem, gen_lowpart (QImode, operands[1])));
   set_insn_deleted (op2_def.insn);
   set_insn_deleted (op0_use);
   return true;
Index: gcc/testsuite/gcc.target/rx/pr83831.c
===
--- gcc/testsuite/gcc.target/rx/pr83831.c	(revision 257733)
+++ gcc/testsuite/gcc.target/rx/pr83831.c	(working copy)
@@ -1,8 +1,8 @@
 /* { dg-do compile }  */
 /* { dg-options "-O1" }  */
 /* { dg-final { scan-assembler-times "bclr" 6 } }  */
-/* { dg-final { scan-assembler-times "bset" 6 } }  */
-/* { dg-final { scan-assembler-times "bnot" 6 } }  */
+/* { dg-final { scan-assembler-times "bset" 7 } }  */
+/* { dg-final { scan-assembler-times "bnot" 7 } }  */
 
 void
 test_0 (char* x, unsigned int y)
@@ -29,13 +29,14 @@
 }
 
 void
-test_3 (char* x, unsigned int y)
+test_3 (char* x, unsigned int y, unsigned int z)
 {
-  /* Expect 4x bset here.  */
+  /* Expect 5x bset here.  */
   x[0] |= 0x10;
   x[1] = y | (1 << 1);
   x[2] |= 0x10;
   x[65000] |= 0x10;
+  x[5] |= 1 << z;
 }
 
 unsigned int
@@ -53,13 +54,14 @@
 }
 
 void
-test_6 (char* x, unsigned int y)
+test_6 (char* x, unsigned int y, unsigned int z)
 {
-  /* Expect 4x bnot here.  */
+  /* Expect 5x bnot here.  */
   x[0] ^= 0x10;
   x[1] = y ^ (1 << 1);
   x[2] ^= 0x10;
   x[65000] ^= 0x10;
+  x[5] ^= 1 << z;
 }
 
 unsigned int


Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-15 Thread Oleg Endo
On Wed, 2018-02-14 at 01:06 +0900, Oleg Endo wrote:
> 
> Do you happen to have any other numbers on the resulting code
> size/speed?  Looking at the new costs that the patch introduces, I'd
> expect there to be some more changes than just the 1/x...
> 

I've checked your proposed patch with the CSiBE set for code size
changes.

With your patch, the code size of the whole set:
sum:  2806044 -> 2801346-4698 / -0.167424 %


Taking out this piece

    case IF_THEN_ELSE:
  *total = COSTS_N_INSNS (3);
  return true;

from the rx_rtx_costs results in:
sum:  2806044 -> 2801099-4945 / -0.176227 %


Taking out another piece 

      if (GET_CODE (XEXP (x, 0)) == MEM
 || GET_CODE (XEXP (x, 1)) == MEM)
   *total = COSTS_N_INSNS (3);
  else

results in:
sum:  2806044 -> 2800315-5729 / -0.204166 %

So I'd like to propose the attached patch instead, as it eliminates 1
KByte of code more from the whole set.

Just in case, I'm testing it now with
  "make -k check" on rx-sim for c and c++

OK for trunk if it passes?

Cheers,
Oleg

gcc/ChangeLog:
* config/rx/rx.c (rx_rtx_costs): New function.
(TARGET_RTX_COSTS): Override to use rx_rtx_costs.Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 257655)
+++ gcc/config/rx/rx.c	(working copy)
@@ -2976,6 +2976,62 @@
 }
 
 static bool
+rx_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,
+	  int opno ATTRIBUTE_UNUSED, int* total, bool speed)
+{
+  if (x == const0_rtx)
+{
+  *total = 0;
+  return true;
+}
+
+  switch (GET_CODE (x))
+{
+case MULT:
+  if (mode == DImode)
+	{
+	  *total = COSTS_N_INSNS (2);
+	  return true;
+	}
+  /* fall through */
+
+case PLUS:
+case MINUS:
+case AND:
+case COMPARE:
+case IOR:
+case XOR:
+  *total = COSTS_N_INSNS (1);
+  return true;
+
+case DIV:
+  if (speed)
+	/* This is the worst case for a division.  Pessimize divisions when
+	   not optimizing for size and allow reciprocal optimizations which
+	   produce bigger code.  */
+	*total = COSTS_N_INSNS (20);
+  else
+	*total = COSTS_N_INSNS (3);
+  return true;
+
+case UDIV:
+  if (speed)
+	/* This is the worst case for a division.  Pessimize divisions when
+	   not optimizing for size and allow reciprocal optimizations which
+	   produce bigger code.  */
+	*total = COSTS_N_INSNS (18);
+  else
+	*total = COSTS_N_INSNS (3);
+  return true;
+
+default:
+  break;
+}
+
+  return false;
+}
+
+static bool
 rx_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to)
 {
   /* We can always eliminate to the frame pointer.
@@ -3709,6 +3765,9 @@
 #undef  TARGET_MODES_TIEABLE_P
 #define TARGET_MODES_TIEABLE_P			rx_modes_tieable_p
 
+#undef  TARGET_RTX_COSTS
+#define TARGET_RTX_COSTS rx_rtx_costs
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rx.h"


Re: [RX] Fix PR 83831 -- Unused bclr, bnot, bset insns

2018-02-14 Thread Oleg Endo
On Tue, 2018-02-13 at 17:04 +, Nick Clifton wrote:
> 
> > gcc/ChangeLog:
> > 
> > PR target/83831
> > * config/rx/rx-protos.h (rx_reg_dead_or_unused_after_insn,
> > rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
> > declarations.
> > (set_of_reg): New struct.
> > (rx_find_set_of_reg, rx_find_use_of_reg): New functions.
> > * config/rx/rx.c (rx_reg_dead_or_unused_after_insn,
> > rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
> > functions.
> > * config/rx/rx.md (andsi3, iorsi3, xorsi3): Convert to insn_and_split.
> > Split into bitclr, bitset, bitinvert patterns if appropriate.
> > (*bitset, *bitinvert, *bitclr): Convert to named insn_and_split and
> > use rx_fuse_in_memory_bitop.
> > (*bitset_in_memory, *bitinvert_in_memory, *bitclr_in_memory): Convert
> > to named insn, correct maximum insn length.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/83831
> > * gcc.target/rx/pr83831.c: New tests.

> Approved - please apply - and thanks very much for doing this!

Thanks!  Committed as r257655.

Cheers,
Oleg


[RX] Fix PR 83831 -- Unused bclr, bnot, bset insns

2018-02-13 Thread Oleg Endo
Hi,

The attached patch fixes the deficits mentioned in PR 83831 which is
about unused bclr, bnot and bset instructions.

For some simple cases, the combine pass can successfully fuse a load-
modify-store insn sequence into an insn that operates on a memory
directly.  However, in some cases where it thinks it's too complex, it
will not try to combine the insns.

What I'm doing here is similar to what I've been doing on SH in the
split1 pass after combine -- manually walking the insns up/down
(limited to the BB) to find the def/use and fuse 3 insns into 1, if
it's possible to do so.

For that I've copy-pasted some of the RTL utility functions from SH.  I
will propose folding and moving those into rtl.h / rtlanal.c during
next stage 1.

With that patch, I get a code size decrease of about 1 KByte on a
larger application.

The attached patch is the version for GCC 8 (trunk).  I've posted
versions for GCC 6 and GCC 7 in the PR.  All 3 patches have been tested
with 
   "make -k check" on rx-sim for c and c++

with no new failures.

OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:

PR target/83831
* config/rx/rx-protos.h (rx_reg_dead_or_unused_after_insn,
rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
declarations.
(set_of_reg): New struct.
(rx_find_set_of_reg, rx_find_use_of_reg): New functions.
* config/rx/rx.c (rx_reg_dead_or_unused_after_insn,
rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
functions.
* config/rx/rx.md (andsi3, iorsi3, xorsi3): Convert to insn_and_split.
Split into bitclr, bitset, bitinvert patterns if appropriate.
(*bitset, *bitinvert, *bitclr): Convert to named insn_and_split and
use rx_fuse_in_memory_bitop.
(*bitset_in_memory, *bitinvert_in_memory, *bitclr_in_memory): Convert
to named insn, correct maximum insn length.

gcc/testsuite/ChangeLog:

PR target/83831
* gcc.target/rx/pr83831.c: New tests.Index: gcc/config/rx/rx-protos.h
===
--- gcc/config/rx/rx-protos.h	(revision 257549)
+++ gcc/config/rx/rx-protos.h	(working copy)
@@ -63,6 +63,112 @@
 extern void		rx_split_cbranch (machine_mode, enum rtx_code,
 	  rtx, rtx, rtx);
 extern machine_mode	rx_select_cc_mode (enum rtx_code, rtx, rtx);
+
+extern bool rx_reg_dead_or_unused_after_insn (const rtx_insn* i, int regno);
+extern void rx_copy_reg_dead_or_unused_notes (rtx reg, const rtx_insn* src,
+	  rtx_insn* dst);
+
+extern bool rx_fuse_in_memory_bitop (rtx* operands, rtx_insn* curr_insn,
+ rtx (*gen_insn)(rtx, rtx));
+
+/* Result value of rx_find_set_of_reg.  */
+struct set_of_reg
+{
+  /* The insn where sh_find_set_of_reg stopped looking.
+ Can be NULL_RTX if the end of the insn list was reached.  */
+  rtx_insn* insn;
+
+  /* The set rtx of the specified reg if found, NULL_RTX otherwise.  */
+  const_rtx set_rtx;
+
+  /* The set source rtx of the specified reg if found, NULL_RTX otherwise.
+ Usually, this is the most interesting return value.  */
+  rtx set_src;
+};
+
+/* FIXME: Copy-pasta from SH.  Move to rtl.h.
+   Given a reg rtx and a start insn, try to find the insn that sets
+   the specified reg by using the specified insn stepping function,
+   such as 'prev_nonnote_nondebug_insn_bb'.  When the insn is found,
+   try to extract the rtx of the reg set.  */
+template  inline set_of_reg
+rx_find_set_of_reg (rtx reg, rtx_insn* insn, F stepfunc,
+		bool ignore_reg_reg_copies = false)
+{
+  set_of_reg result;
+  result.insn = insn;
+  result.set_rtx = NULL_RTX;
+  result.set_src = NULL_RTX;
+
+  if (!REG_P (reg) || insn == NULL_RTX)
+return result;
+
+  for (rtx_insn* i = stepfunc (insn); i != NULL_RTX; i = stepfunc (i))
+{
+  if (BARRIER_P (i))
+	break;
+  if (!INSN_P (i) || DEBUG_INSN_P (i))
+	  continue;
+  if (reg_set_p (reg, i))
+	{
+	  if (CALL_P (i))
+	break;
+
+	  result.insn = i;
+	  result.set_rtx = set_of (reg, i);
+
+	  if (result.set_rtx == NULL_RTX || GET_CODE (result.set_rtx) != SET)
+	break;
+
+	  result.set_src = XEXP (result.set_rtx, 1);
+
+	  if (ignore_reg_reg_copies && REG_P (result.set_src))
+	{
+	  reg = result.set_src;
+	  continue;
+	}
+	  if (ignore_reg_reg_copies && SUBREG_P (result.set_src)
+	  && REG_P (SUBREG_REG (result.set_src)))
+	{
+	  reg = SUBREG_REG (result.set_src);
+	  continue;
+	}
+
+	  break;
+	}
+}
+
+  /* If the searched reg is found inside a (mem (post_inc:SI (reg))), set_of
+ will return NULL and set_rtx will be NULL.
+ In this case report a 'not found'.  result.insn will always be non-null
+ at this point, so no need to check it.  */
+  if (result.set_src != NULL && result.set_rtx == NULL)
+result.set_src = NULL;
+
+  return result;
+}
+
+/* FIXME: Move to rtlh.h.  */
+template  inline rtx_insn*
+rx_find_use_of_reg (rtx reg, rtx_insn* insn, F 

Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-13 Thread Oleg Endo
Hi,

On Tue, 2018-02-13 at 15:54 +, Sebastian Perta wrote:
> 
> The patch required some changes (the prototype, second param more
> exactly, has changed) in order to compile in the trunk.

Could you please send the patch as an attachment?  The formatting looks
a bit off (tabs spaces etc).

> So I updated this and I also same a further change to the patch: I
> disabled the replacement of the division with multiplication of
> reciprocals on -Os because it increases code size for example for the
> following division:

Do you happen to have any other numbers on the resulting code
size/speed?  Looking at the new costs that the patch introduces, I'd
expect there to be some more changes than just the 1/x...

Cheers,
Oleg


Re: PING [PATCH] RX movsicc degrade fix

2018-02-12 Thread Oleg Endo
On Mon, 2018-02-12 at 11:06 +, Sebastian Perta wrote:


> > > 1) there should be a space between * and the filename
> The spaces are there (see the changelog), the renesas mail server
> removes them sometimes

You might want to send around your patches as email attachments.  That
avoids formatting issues.

Cheers,
Oleg


[SH][committed] Fix PR 81485

2018-01-21 Thread Oleg Endo
Hi,

The following fixes PR 81485.
Tested with make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-
ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r256930.

Cheers,
Oleg

gcc/ChangeLog:
PR target/81485
* config/sh/sh-protos.h (sh_find_set_of_reg): Remove assert.Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 256929)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -228,9 +228,13 @@
 	}
 }
 
-  if (result.set_src != NULL)
-gcc_assert (result.insn != NULL && result.set_rtx != NULL);
-
+  /* If the searched reg is found inside a (mem (post_inc:SI (reg))), set_of
+ will return NULL and set_rtx will be NULL.
+ In this case report a 'not found'.  result.insn will always be non-null
+ at this point, so no need to check it.  */
+  if (result.set_src != NULL && result.set_rtx == NULL)
+result.set_src = NULL;
+ 
   return result;
 }
 


[SH][committed] Fix PR 80870

2018-01-20 Thread Oleg Endo
Hi,

The following fixed PR 80870.
For whatever reason one of the source files in config/sh was still
including  and  directly...

Committed as r256926 (trunk), r256928 (GCC 7), r256929 (GCC 6).

Cheers,
Oleg

gcc/ChangeLog:
PR target/80870
* config/sh/sh_optimize_sett_clrt.cc:
Use INCLUDE_ALGORITHM and INCLUDE_VECTOR instead of direct includes.Index: gcc/config/sh/sh_optimize_sett_clrt.cc
===
--- gcc/config/sh/sh_optimize_sett_clrt.cc	(revision 256924)
+++ gcc/config/sh/sh_optimize_sett_clrt.cc	(working copy)
@@ -20,6 +20,8 @@
 #define IN_TARGET_CODE 1
 
 #include "config.h"
+#define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -29,9 +31,6 @@
 #include "cfgrtl.h"
 #include "tree-pass.h"
 
-#include 
-#include 
-
 /*
 This pass tries to eliminate unnecessary sett or clrt instructions in cases
 where the ccreg value is already known to be the same as the constant set
Index: gcc/config/sh/sh_optimize_sett_clrt.cc
===
--- gcc/config/sh/sh_optimize_sett_clrt.cc	(revision 256924)
+++ gcc/config/sh/sh_optimize_sett_clrt.cc	(working copy)
@@ -18,6 +18,8 @@
 .  */
 
 #include "config.h"
+#define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -27,9 +29,6 @@
 #include "cfgrtl.h"
 #include "tree-pass.h"
 
-#include 
-#include 
-
 /*
 This pass tries to eliminate unnecessary sett or clrt instructions in cases
 where the ccreg value is already known to be the same as the constant set


[wwwdocs][committed] Mention GCC 7 changes for SH and RX

2018-01-20 Thread Oleg Endo
Hi,

Somehow I never managed to commit the attached patch.
Better late than never.

Cheers,
Oleg? gcc7_rx_sh_update.patch
Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.96
diff -r1.96 changes.html
1065c1065,1067
< 
---
> RX
> Basic support for atomic built-in function has been added.  It is currently
> implemented by flipping interrupts off and on as needed.
1067c1069,1094
< 
---
> SH
> 
>   Support for SH5/SH64 has been removed.
>   Improved utilization of delay slots on SH2A.
>   Improved utilization of zero-displacement conditional branches.
>   The following deprecated options have been removed
> 
>   -mcbranchdi
>   -mcmpeqdi
>   -minvalid-symbols
>   -msoft-atomic
>   -mspace
>   -madjust-unroll
> 
>   
>   Support for the following SH2A instructions has been added
> 
>   mov.b  @-Rm,R0
>   mov.w  @-Rm,R0
>   mov.l  @-Rm,R0
>   mov.b  R0,@Rn+
>   mov.w  R0,@Rn+
>   mov.l  R0,@Rn+
> 
>   
> 


Re: [RX] Fix PR 81819

2018-01-13 Thread Oleg Endo
On Thu, 2018-01-11 at 17:32 +, Nick Clifton wrote:
> 
> > gcc/ChangeLog:
> > PR target/81819
> > * config/rx/rx.c (rx_is_restricted_memory_address):
> > Handle SUBREG case.
> Go ahead make my day^H^H^H^H^H^H
> 
> Approved - please apply.

Committed as r256578 (trunk) and r256579 (GCC 7 branch).

Cheers,
Oleg


[RX] Fix PR 81819

2018-01-11 Thread Oleg Endo
Hi,

The attached patch fixes PR 81819, which popped up on GCC 7.  I assume
it's also there on trunk, but can't build my app with trunk compiler
because it's got other issues.

Unfortunately I can't add the reproducer test case since it happens
only when building a bigger app with LTO.  But I have confirmed that
with this fix the app builds and runs.

OK for trunk and GCC 7?

Cheers,
Oleg

gcc/ChangeLog:
PR target/81819
* config/rx/rx.c (rx_is_restricted_memory_address):
Handle SUBREG case.Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 256385)
+++ gcc/config/rx/rx.c	(working copy)
@@ -284,6 +284,9 @@
   /* Simple memory addresses are OK.  */
   return true;
 
+case SUBREG:
+  return RX_REG_P (SUBREG_REG (mem));
+
 case PRE_DEC:
 case POST_INC:
   return false;


Re: [RX] Fix PR 81821

2018-01-11 Thread Oleg Endo
On Thu, 2018-01-11 at 15:10 +, Nick Clifton wrote:
> 
> > OK for trunk and GCC 7?
> Approved.  Do you have access to the repository, or would you like me
> to apply the patch for you ?

Thanks.  Committed as r256536 (trunk) and r256538 (GCC 7).

Cheers,
Oleg


[RX] Fix PR 81821

2018-01-11 Thread Oleg Endo
Hi,

The attached patch fixes PR 81821.  It is the same as the patch
proposed in the PR.  I have briefly checked it in a private application
that uses atomic variables and I think it's rather obvious.

I have added atomics support on RX in GCC 7 and that was an initial
bug.  Thus I'd like to also apply it to the GCC 7 branch.

OK for trunk and GCC 7?

Cheers,
Oleg

gcc/ChangeLog:
* config/rx/rx.md (BW): New mode attribute.
(sync_lock_test_and_setsi): Add mode suffix to insn output.Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md	(revision 256385)
+++ gcc/config/rx/rx.md	(working copy)
@@ -2167,6 +2167,7 @@
   [(plus "add") (minus "sub") (ior "ior") (xor "xor") (and "and")])
 
 (define_mode_iterator QIHI [QI HI])
+(define_mode_attr BW [(QI "B") (HI "W")])
 
 (define_insn "sync_lock_test_and_setsi"
   [(set (match_operand:SI 0 "register_operand"   "=r,r")
@@ -2208,7 +2209,7 @@
(set (match_dup 1)
 	(match_operand:QIHI 2 "register_operand""0"))]
   ""
-  "xchg\t%1, %0"
+  "xchg\t%1., %0"
   [(set_attr "length" "6")
(set_attr "timings" "22")]
 )


Re: [PATCH v2] libgo: Add support for sh

2018-01-10 Thread Oleg Endo
On Wed, 2018-01-10 at 06:25 -0800, Ian Lance Taylor wrote:
> 
> Thanks.  I finally took a look at this.  I don't know much about SH,
> but I don't think we want to add each SH variant as a separate GOARCH
> value.  As you can see from the list you modified in
> ibgo/go/go/build/syslist.go, the difference between GOARCH values is
> essentially the calling convention.  There are many different kinds of
> x86 processors, but since the only calling convention difference is
> between 32-bit and 64-bit, the list has only 386 and amd64.  Similarly
> it seems to me we should have only sh and shbe in the list for SH
> processors.

On SH the calling convention depends on the processor features which
are available/used.  For example there is a "no-fpu" mode which uses
software fp and passes all fp values in gp regs, while the "normal"
mode is to pass fp values in fp regs.  Some of the CPU variants like
SH2 imply "no-fpu".

Cheers,
Oleg


Re: [PATCH] RX pragma address

2018-01-05 Thread Oleg Endo
On Fri, 2018-01-05 at 12:12 +, Sebastian Perta wrote:
> 
> > > 
> > > Is this for some kind of legacy backwards compatibility of
> > > something?

> Sort of, this is required by the following tool
> https://www.renesas.com/en-eu/products/software-tools/tools/code-
> generator/ap4-application-leading-tool-applilet.html

> There are not many plans to improve this tool and other solutions
> (which might be more complex) might not be possible to implement in
> this tool.

I see.

> 
> The only way I can think of is to put the variable in a separate
> section (using section attribute) and then in the linker script put
> that section at the desired address.
> The problem is AP4 does not touch the linker script it only generates
> source code.
> 
> Do you have any other ideas/suggestions? I'm very happy to listen.

If you can compile the code only as plain C, for example

#define const_addr_var(addr, type) (*(volatile type*)addr)

#define DTCCR const_addr_var (0x00082400, uint8_t)
#define DTCSTS const_addr_var (0x0008240E, uint16_t)


If you can compile the code as C++11 there are certainly more options,
albeit probably not useful for generated code.  For example I do those
things this way:

// read-write hardware register with constant address
static constexpr hw_reg_rw> DTCCR = { };

// ready-only hardware register with constant address
static constexpr hw_reg_r> DTCSTS = { };


In both cases (C and C++) the following will compile to the same code:

void test_wr (void)
{
  DTCCR = 123;
}

int test_rd (void)
{
  return DTCSTS;
}

volatile void* get_reg_addr (void)
{
  return 
}

For a possible implementation of the hw_reg thingy see
https://github.com/shared-ptr/bits/blob/master/hw_reg.hpp

But really, if that is just for some code generator, why not simply
adjust the code generator to spit out normal C code instead of
cluttering the compiler with non-portable pragmas?  You have also
posted a similar thing for RL78 a while ago, so in the end the same
pragma thing would be re-implemented in the compiler three times (M32C,
RL78, RX)?  In that case, maybe better move it out from the M32C target
and make it available for every target?

Cheers,
Oleg


Re: [PATCH] RX pragma address

2018-01-05 Thread Oleg Endo
Hi,

On Fri, 2018-01-05 at 11:03 +, Sebastian Perta wrote:
> 
> Hello, 
> 
> The following patch adds a new pragma, "pragma address" for RX.
> The patch updates extend.texi and add a test case to the regression
> as well.
> For the test case I checked than test is getting picked up in gcc.log
> unfortunately for the .texi part I don't know where to look/what to
> do to
> get the
> documentation generated.
> 
> This is similar to the pragma address implemented for M32C.

Is this for some kind of legacy backwards compatibility of something?
There are ways how to achieve the same with standard C and C++ language
features ... so I was wondering what's the purpose of this?

Cheers,
Oleg


[SH][committed] Fix PR 83111

2017-11-23 Thread Oleg Endo
Hi,

The attached patch fixes PR 83111.
Committed to mainline as r255096 and to GCC 7 branch as r255097.

Cheers,
Oleg

gcc/ChangeLog

PR target/83111
* config/sh/sh.md (udivsi3, divsi3, sibcall_value_pcrel,
sibcall_value_pcrel_fdpic): Use local variable instead of
operands[3].
(calli_tbr_rel): Add missing operand 2.
(call_valuei_tbr_rel): Add missing operand 3.Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 255095)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2277,8 +2277,8 @@
   ""
 {
   rtx last;
+  rtx func_ptr = gen_reg_rtx (Pmode);
 
-  operands[3] = gen_reg_rtx (Pmode);
   /* Emit the move of the address to a pseudo outside of the libcall.  */
   if (TARGET_DIVIDE_CALL_TABLE)
 {
@@ -2298,16 +2298,16 @@
 	  emit_move_insn (operands[0], operands[2]);
 	  DONE;
 	}
-  function_symbol (operands[3], "__udivsi3_i4i", SFUNC_GOT);
-  last = gen_udivsi3_i4_int (operands[0], operands[3]);
+  function_symbol (func_ptr, "__udivsi3_i4i", SFUNC_GOT);
+  last = gen_udivsi3_i4_int (operands[0], func_ptr);
 }
   else if (TARGET_DIVIDE_CALL_FP)
 {
-  rtx lab = function_symbol (operands[3], "__udivsi3_i4", SFUNC_STATIC).lab;
+  rtx lab = function_symbol (func_ptr, "__udivsi3_i4", SFUNC_STATIC).lab;
   if (TARGET_FPU_SINGLE)
-	last = gen_udivsi3_i4_single (operands[0], operands[3], lab);
+	last = gen_udivsi3_i4_single (operands[0], func_ptr, lab);
   else
-	last = gen_udivsi3_i4 (operands[0], operands[3], lab);
+	last = gen_udivsi3_i4 (operands[0], func_ptr, lab);
 }
   else if (TARGET_SH2A)
 {
@@ -2318,8 +2318,8 @@
 }
   else
 {
-  rtx lab = function_symbol (operands[3], "__udivsi3", SFUNC_STATIC).lab;
-  last = gen_udivsi3_i1 (operands[0], operands[3], lab);
+  rtx lab = function_symbol (func_ptr, "__udivsi3", SFUNC_STATIC).lab;
+  last = gen_udivsi3_i1 (operands[0], func_ptr, lab);
 }
   emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
   emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]);
@@ -2405,22 +2405,22 @@
   ""
 {
   rtx last;
+  rtx func_ptr = gen_reg_rtx (Pmode);
 
-  operands[3] = gen_reg_rtx (Pmode);
   /* Emit the move of the address to a pseudo outside of the libcall.  */
   if (TARGET_DIVIDE_CALL_TABLE)
 {
-  function_symbol (operands[3], sh_divsi3_libfunc, SFUNC_GOT);
-  last = gen_divsi3_i4_int (operands[0], operands[3]);
+  function_symbol (func_ptr, sh_divsi3_libfunc, SFUNC_GOT);
+  last = gen_divsi3_i4_int (operands[0], func_ptr);
 }
   else if (TARGET_DIVIDE_CALL_FP)
 {
-  rtx lab = function_symbol (operands[3], sh_divsi3_libfunc,
+  rtx lab = function_symbol (func_ptr, sh_divsi3_libfunc,
  SFUNC_STATIC).lab;
   if (TARGET_FPU_SINGLE)
-	last = gen_divsi3_i4_single (operands[0], operands[3], lab);
+	last = gen_divsi3_i4_single (operands[0], func_ptr, lab);
   else
-	last = gen_divsi3_i4 (operands[0], operands[3], lab);
+	last = gen_divsi3_i4 (operands[0], func_ptr, lab);
 }
   else if (TARGET_SH2A)
 {
@@ -2431,8 +2431,8 @@
 }
   else
 {
-  function_symbol (operands[3], sh_divsi3_libfunc, SFUNC_GOT);
-  last = gen_divsi3_i1 (operands[0], operands[3]);
+  function_symbol (func_ptr, sh_divsi3_libfunc, SFUNC_GOT);
+  last = gen_divsi3_i1 (operands[0], func_ptr);
 }
   emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
   emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]);
@@ -6519,6 +6519,7 @@
   [(call (mem (match_operand:SI 0 "symbol_ref_operand" ""))
 	 (match_operand 1 "" ""))
(use (reg:SI FPSCR_MODES_REG))
+   (use (match_scratch 2))
(clobber (reg:SI PR_REG))]
   "TARGET_SH2A && sh2a_is_function_vector_call (operands[0])"
 {
@@ -6629,6 +6630,7 @@
 	(call (mem:SI (match_operand:SI 1 "symbol_ref_operand" ""))
 	  (match_operand 2 "" "")))
(use (reg:SI FPSCR_MODES_REG))
+   (use (match_scratch 3))
(clobber (reg:SI PR_REG))]
   "TARGET_SH2A && sh2a_is_function_vector_call (operands[1])"
 {
@@ -7044,13 +7046,11 @@
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
-  rtx call_insn;
+  rtx tmp =  gen_rtx_REG (SImode, R1_REG);
 
-  operands[3] =  gen_rtx_REG (SImode, R1_REG);
-
-  sh_expand_sym_label2reg (operands[3], operands[1], lab, true);
-  call_insn = emit_call_insn (gen_sibcall_valuei_pcrel (operands[0],
-			operands[3],
+  sh_expand_sym_label2reg (tmp, operands[1], lab, true);
+  rtx call_insn = emit_call_insn (gen_sibcall_valuei_pcrel (operands[0],
+			tmp,
 			operands[2],
 			copy_rtx (lab)));
   SIBLING_CALL_P (call_insn) = 1;
@@ -7078,12 +7078,11 @@
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
+  rtx tmp = gen_rtx_REG (SImode, R1_REG);
 
-  operands[3] =  gen_rtx_REG (SImode, R1_REG);
-
-  sh_expand_sym_label2reg (operands[3], operands[1], lab, true);
+  sh_expand_sym_label2reg (tmp, operands[1], lab, true);
   rtx i = 

Re: Limit SH strncmp inline expansion (PR target/78460)

2017-08-16 Thread Oleg Endo
On Tue, 2017-08-15 at 23:44 +, Joseph Myers wrote:
> 
> > This is an older issue.  Please also add a reference to PR 67712 in
> > your commit.  Can you also apply it to GCC 6 branch please?

> I can't reproduce the problem with GCC 6 branch; the glibc testsuite 
> builds fine without out-of-memory issues, as does the test included
> in the  patch, despite the GCC code in question being essentially
> unchanged.  That's why the bug appears to me as a regression in GCC
> 7.

OK, thanks for the investigation!

Cheers,
Oleg


Re: Limit SH strncmp inline expansion (PR target/78460)

2017-08-15 Thread Oleg Endo
On Tue, 2017-08-15 at 21:15 +, Joseph Myers wrote:
> GCC mainline built for sh4-linux-gnu runs out of memory building a
> glibc test, which calls strncmp with very large constant size
> argument, resulting in the SH inline strncmp expansion trying to
> inline a fully unrolled expansion of strncmp for that size.
> 
> This patch limits that fully unrolled expansion to the case of less
> than 32 bytes.  This is explicitly *not* trying to be optimal in any
> way (very likely a lower threshold makes sense), just to limit enough
> to avoid the out-of-memory issue in the glibc testsuite.
> 
> I have *not* run the GCC testsuite for SH.  I have verified that this
> allows the glibc testsuite to build OK, with both GCC mainline and
> GCC 7 branch (and that the included test builds quickly with patched
> GCC, runs out of memory with unpatched GCC).
> 
> OK to commit (mainline and GCC 7 branch)?

Yes, that's OK.
This is an older issue.  Please also add a reference to PR 67712 in
your commit.  Can you also apply it to GCC 6 branch please?

Thanks!

Cheers,
Oleg


Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Oleg Endo
On Thu, 2017-08-03 at 19:31 +0300, Alexander Monakov wrote:
> 
> My mistake, but the main point remains: STL uses 'sort' without the
> 'q'.

I think it'd be better if GCC's custom containers somewhat tried to
follow C++ standard container idioms.  Chopping off the 'q' is one step
towards it.

Cheers,
Oleg


Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Oleg Endo
On Thu, 2017-08-03 at 19:23 +0300, Alexander Monakov wrote:
> 
> Note that with vec::qsort -> vec::sort renaming (which should be less
> controversial, STL also has std::vector::sort)

No it doesn't?  One uses std::sort from  on a pair of random
access iterators to sort a std::vector.

Cheers,
Oleg


Re: [PATCH 12/17] Add server.h and server.c

2017-07-26 Thread Oleg Endo
On Mon, 2017-07-24 at 16:05 -0400, David Malcolm wrote:
> 
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#ifndef GCC_SERVER_H
> +#define GCC_SERVER_H
> +
> +/* Wrapper aroung "int" for file descriptors.  */
                ~~~^
              around :)


Re: [PATCH 6/6] sh: Fixes for RTL checking

2017-02-21 Thread Oleg Endo
On Tue, 2017-02-21 at 14:48 +, Segher Boessenkool wrote:
> 2017-02-21  Segher Boessenkool  
> 
>   * config/sh/sh.md (tstsi_t): If operands[0] is a SUBREG instead of
>   a REG, look at the REG it is a SUBREG of.
>   (splitter for cmpeqsi_t): Ditto.

> 
> ---
>  gcc/config/sh/sh.md | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index 2645fca..e19e977 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -561,8 +561,12 @@ (define_insn_and_split "tstsi_t"
>    gcc_assert (CONST_INT_P (operands[1]));
>  
>    HOST_WIDE_INT op1val = INTVAL (operands[1]);
> +  rtx reg = operands[0];
> +  if (SUBREG_P (reg))
> +reg = SUBREG_REG (reg);
> +  gcc_assert (REG_P (reg));
>    bool op0_dead_after_this =
> - sh_reg_dead_or_unused_after_insn (curr_insn, REGNO
> (operands[0]));
> + sh_reg_dead_or_unused_after_insn (curr_insn, REGNO (reg));
>  

Thanks for dealing with those.

That SUBREG vs. REG stuff is annoying.  Isn't there a simple function
that just does the right thing which can be used instead of manually
open-coding these checks over and over again?

Cheers,
Oleg


Re: [PATCH] Fix buffer overflow in SH expand_cbranchdi4 (PR target/79462)

2017-02-14 Thread Oleg Endo
On Tue, 2017-02-14 at 09:22 +0100, Jakub Jelinek wrote:
> Hi!
> 
> The following patch fixes a buffer overflow in the SH backend.
> r235698 removed an operand (clobber of match_scratch) from the
> various
> cbranch pattersn that called expand_cbranchdi4 as well as all but
> one references to operands[4] in that code.  Now that the insn only
> has 4 operands, clearing operands[4] is a buffer overflow.
> 
> Tested by Kaz (thanks).
> In the PR Oleg asked for a comment, but I'm not sure how useful is
> it to document that something used to be cleared and is not anymore,
> because it doesn't exist.
> 
> Ok for trunk (or suggested wording for a comment)?
> 

Sorry, I haven't checked the code in a while.  If it's the last
reference, then of course a comment would be just confusing like you've
said.  Thanks for figuring it out.  OK as it is for trunk and the other
branches.

Cheers,
Oleg


Re: config/ sync with binutils vs. removing gcc targets.

2016-12-07 Thread Oleg Endo
Hi,

Yeah, my SH5/SH64 removal procedures might have been a little too
radical, sorry about that.  However ...

On Wed, 2016-12-07 at 09:10 +1030, Alan Modra wrote:
> I understand that config/ in the gcc repository is the master source
> for binutils-gdb config/.  If that's the case then people removing
> gcc support for particular targets need to be careful about their
> config/ edits.  If a target is still supported in binutils, then
> please don't rip out config/ support for the target.

How exactly is this going to work when support for a configuration has
been removed from the compiler, but not from the configuration itself?
 That's just confusing for the user and doesn't make sense, I believe.

Cheers,
Oleg


Re: [PATCH v3] Do not simplify "(and (reg) (const bit))" to if_then_else.

2016-12-05 Thread Oleg Endo
On Mon, 2016-12-05 at 04:00 -0600, Segher Boessenkool wrote:
> On Mon, Dec 05, 2016 at 10:22:13AM +0100, Dominik Vogt wrote:
> > 
> > On Sat, Dec 03, 2016 at 07:19:13PM -0600, Segher Boessenkool wrote:
> > > 
> > > [ I did not see this patch before, sorry. ]
> > > 
> > > This causes the second half of PR78638.
> > > 
> > > On Thu, Dec 01, 2016 at 04:30:08PM +0100, Dominik Vogt wrote:
> > > > 
> > > > --- a/gcc/combine.c
> > > > +++ b/gcc/combine.c
> > > > @@ -5600,6 +5600,18 @@ combine_simplify_rtx (rtx x,
> > > > machine_mode op0_mode, int in_dest,
> > > >      && OBJECT_P (SUBREG_REG (XEXP (x,
> > > > 0)))
> > > >  {
> > > >    rtx cond, true_rtx, false_rtx;
> > > > +  unsigned HOST_WIDE_INT nz;
> > > > +
> > > > +  /* If the operation is an AND wrapped in a SIGN_EXTEND
> > > > or ZERO_EXTEND with
> > > > +    either operand being just a constant single bit
> > > > value, do nothing since
> > > > +    IF_THEN_ELSE is likely to increase the expression's
> > > > complexity.  */
> > > > +  if (HWI_COMPUTABLE_MODE_P (mode)
> > > > +     && pow2p_hwi (nz = nonzero_bits (x, mode))
> > > > +     && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND)
> > > > +   && GET_CODE (XEXP (x, 0)) == AND
> > > > +   && CONST_INT_P (XEXP (XEXP (x, 0), 0))
> > > > +   && UINTVAL (XEXP (XEXP (x, 0), 0)) == nz))
> > > > +     return x;
> > > The code does not match the comment: the "!" should not be
> > > there.  How
> > > did it fix anything on s390 *with* that "!"?  That does not make
> > > much
> > > sense.
> > Sorry for breaking this.  With the constant changes in the
> > patterns this is supposed to fix it seems I've lost track of the
> > status quo.  I'll check what went wrong with the patch; in the
> > mean time Andreas will revert this, or if it's urgent, feel free
> > to do that yourself.

> I have tested that removing that ! cures all regressions.  I do not
> know if it still fixes what this patch intended to fix, of course.

I haven't been following this, but it seems some of these changes also
triggered bleh on SH:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78633

Cheers,
Oleg


Re: [RFC][PATCH] Canonicalize address multiplies

2016-10-04 Thread Oleg Endo
On Tue, 2016-10-04 at 12:53 +, Wilco Dijkstra wrote:
> GCC currently doesn't canonicalize address expressions. As a result
> inefficient code is generated even for trivial index address
> expressions,
> blocking CSE and other optimizations:
> 
> int f(int *p, int i) { return p[i+2] + p[i+1]; }
> 
>   sxtwx1, w1
>   add x1, x1, 2
>   add x2, x0, x1, lsl 2
>   ldr w0, [x0, x1, lsl 2]
>   ldr w1, [x2, -4]
>   add w0, w1, w0
>   ret
> 
> After this patch:
> 
>   add x1, x0, x1, sxtw 2
>   ldp w0, w2, [x1, 4]
>   add w0, w2, w0
>   ret
> 
> The reason for this is that array index expressions are preferably
> kept in the *(p + (i + C0) * C1) form eventhough it is best on most
> targets to make use of an offset in memory accesses - ie. *(p + i *
> C1 + (C0*C1)).
> 
> This patch disables the folding in fold_plusminus_mult_expr that
> changes the latter form into the former.  Unfortunately it isn't
> possible to know it is an address expression, and neither is there a
> way to decide when C0*C1 is too complex. 
> 
> So is there a better way/place to do this, or do we need an address
> canonicalization phase in the tree that ensures we expand addresses
> in an efficient manner, taking into account target offsets?

There's been an effort to implement address mode selection (AMS)
optimization in GCC as part of the GSoC program.  However, it hasn't
been mainlined yet and it's for SH only, but I'd like to move that
forward and make it available to other backends, too.

It's an RTL pass and works by analyzing memory accesses inside basic
blocks, figuring out the effective address expressions, querying the
backend for address mode alternatives for each memory access and the
associated costs.  With that information it tries to find a minimal
solution (minimizing address register calculations and minimizing
address mode alternative costs), which is currently implemented with
backtracking.

For SH, the AMS pass can convert your example above from this

_f:
mov r5,r0
add #2,r0
shll2   r0
mov r4,r1
add r0,r1
mov.l   @(r0,r4),r0
add #-4,r1
mov.l   @r1,r2
rts 
add r2,r0

into this:
_f:
shll2   r5
add r5,r4
mov.l   @(4,r4),r0
mov.l   @(8,r4),r1
rts 
add r1,r0

.. which is minimal on SH.


It also fixes several missed auto-inc opportunities and was meant to
allow further address mode related optimizations like displacement
range fitting or access reordering.

Although not yet ready for mainline, the current code can be found on
github:

https://github.com/erikvarga/gcc/commits/master

https://github.com/erikvarga/gcc/blob/master/gcc/ams.h
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc

The way AMS gets address mode information from the backend is different
to GCC's current approach:

https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11946

Since the SH ISA is a bit irregular, there are a bunch of exceptions
and special cases in the cost calculations which take surrounding insns
and mem accesses into account.  I guess a more regular or less
restrictive ISA wouldn't need too many special cases.


Cheers,
Oleg


Re: [PATCH v3] Optimize strchr to strlen

2016-09-29 Thread Oleg Endo
On Fri, 2016-09-23 at 23:10 +0900, Oleg Endo wrote:
> On Fri, 2016-09-23 at 14:07 +, Wilco Dijkstra wrote:
> > 
> > After discussion (https://gcc.gnu.org/ml/gcc-patches/2016-09/msg007
> > 18
> > .html)
> > here is the latest version of the strchr patch.  This uses a gimple
> > statement for
> > the pointer addition so the gsi_prev always points at the strlen
> > call.
> > 
> > Optimize strchr (s, 0) to s + strlen (s).  strchr (s, 0) appears a
> > common
> > idiom for finding the end of a string, however it is not a very
> > efficient
> > way of doing so.  Strlen is a much simpler operation which is
> > significantly
> > faster (eg. on x86 strlen is 50% faster for strings of 8 bytes and
> > about
> > twice as fast as strchr on strings of 1KB).
> > 
> > OK for commit?

> Please add PR tree-optimization/61056 to the changelog so that it
> gets linked in bugzilla.

Notice that the "PR AAA/BBB" markers from the changelog should also be
included in the SVN commit message.  Otherwise the bugzilla commit hook
doesn't notice it, because it looks at the commit messages and not at
the contents of the ChangeLog files.

Cheers,
Oleg




Re: [SH][committed] Fix cset_zero pattern regressions

2016-09-27 Thread Oleg Endo
On Sun, 2016-09-25 at 16:06 +0900, Oleg Endo wrote:
> 
> This fixes a fallout that actually goes back to 5.0 but went
> unnoticed.  The costs for movt and movrt type of insns were not
> correctly reported and ifcvt thus made some bad choices for SH.  A
> new cset_zero pattern variant is also required to fix the matching
> for some recent changes in the middle end.
> 
> Tested on sh-elf with
> make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-
> m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> Committed as r240471.
> Backports to GCC 6 and GCC 5 will follow.
> 

While working on the backport, I noticed that I've screwed up the costs
return value.  Funny that it had no effect in this case.

Maybe rtx_costs functions should be made to return something like
std::optional to avoid these kind of mistakes.  It would also cut
the code by half in most cases.

Anyway, fixed as attached, re-tested as above and committed as r240533.

Cheers,
Oleg

gcc/ChangeLog
PR target/51244
* config/sh/sh.c (sh_rtx_costs): Fix return value of SET of movt and
movrt patterns.  Match them before anything else in the SET case.

Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 240471)
+++ gcc/config/sh/sh.c	(working copy)
@@ -3199,6 +3199,12 @@
 	 vector-move, so we have to provide the correct cost in the number
 	 of move insns to load/store the reg of the mode in question.  */
 case SET:
+  if (sh_movt_set_dest (x) != NULL || sh_movrt_set_dest (x) != NULL)
+	{
+	  *total = COSTS_N_INSNS (1);
+	  return true;
+	}
+
   if (register_operand (SET_DEST (x), VOIDmode)
 	&& (register_operand (SET_SRC (x), VOIDmode)
 		|| satisfies_constraint_Z (SET_SRC (x
@@ -3208,10 +3214,6 @@
   / mov_insn_size (mode, TARGET_SH2A));
 	  return true;
 }
-
-  if (sh_movt_set_dest (x) != NULL || sh_movrt_set_dest (x) != NULL)
-	return COSTS_N_INSNS (1);
-
   return false;
 
 /* The cost of a mem access is mainly the cost of the address mode.  */


[SH][committed] Fix cset_zero pattern regressions

2016-09-25 Thread Oleg Endo
Hi,

This fixes a fallout that actually goes back to 5.0 but went unnoticed.
The costs for movt and movrt type of insns were not correctly reported
and ifcvt thus made some bad choices for SH.  A new cset_zero pattern
variant is also required to fix the matching for some recent changes
in the middle end.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-
m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r240471.
Backports to GCC 6 and GCC 5 will follow.

Cheers,
Oleg


gcc/ChangeLog
PR target/51244
* config/sh/sh.c (sh_movt_set_dest, sh_movrt_set_dest): Add overloads.
(sh_rtx_costs): Handle SET of movt and movrt patterns.
* cnofig/sh/sh-protos.h (sh_movt_set_dest, sh_movrt_set_dest): Forward
declare new overloads.
* config/sh/sh.md (*cset_zero): Add variant that takes a treg_set_expr
operand.

gcc/testsuite/ChangeLog
PR target/51244
* gcc.target/sh/pr51244-11.c: Add more detailed expected insn matching.
Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 240470)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -235,7 +235,9 @@
 
 extern bool sh_is_nott_insn (const rtx_insn* i);
 extern rtx sh_movt_set_dest (const rtx_insn* i);
+extern rtx sh_movt_set_dest (const_rtx i);
 extern rtx sh_movrt_set_dest (const rtx_insn* i);
+extern rtx sh_movrt_set_dest (const_rtx i);
 
 inline bool sh_is_movt_insn (const rtx_insn* i)
 {
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 240470)
+++ gcc/config/sh/sh.c	(working copy)
@@ -3208,6 +3208,10 @@
   / mov_insn_size (mode, TARGET_SH2A));
 	  return true;
 }
+
+  if (sh_movt_set_dest (x) != NULL || sh_movrt_set_dest (x) != NULL)
+	return COSTS_N_INSNS (1);
+
   return false;
 
 /* The cost of a mem access is mainly the cost of the address mode.  */
@@ -11703,13 +11707,15 @@
 rtx
 sh_movt_set_dest (const rtx_insn* i)
 {
-  if (i == NULL)
-return NULL;
+  return i == NULL ? NULL : sh_movt_set_dest (PATTERN (i));
+}
 
-  const_rtx p = PATTERN (i);
-  return GET_CODE (p) == SET
-	 && arith_reg_dest (XEXP (p, 0), SImode)
-	 && t_reg_operand (XEXP (p, 1), VOIDmode) ? XEXP (p, 0) : NULL;
+rtx
+sh_movt_set_dest (const_rtx pat)
+{
+  return GET_CODE (pat) == SET
+	 && arith_reg_dest (XEXP (pat, 0), SImode)
+	 && t_reg_operand (XEXP (pat, 1), VOIDmode) ? XEXP (pat, 0) : NULL;
 }
 
 /* Given an insn, check whether it's a 'movrt' kind of insn, i.e. an insn
@@ -11718,18 +11724,20 @@
 rtx
 sh_movrt_set_dest (const rtx_insn* i)
 {
-  if (i == NULL)
-return NULL;
+  return i == NULL ? NULL : sh_movrt_set_dest (PATTERN (i));
+}
 
-  const_rtx p = PATTERN (i);
-
+rtx
+sh_movrt_set_dest (const_rtx pat)
+{
   /* The negc movrt replacement is inside a parallel.  */
-  if (GET_CODE (p) == PARALLEL)
-p = XVECEXP (p, 0, 0);
+  if (GET_CODE (pat) == PARALLEL)
+pat = XVECEXP (pat, 0, 0);
 
-  return GET_CODE (p) == SET
-	 && arith_reg_dest (XEXP (p, 0), SImode)
-	 && negt_reg_operand (XEXP (p, 1), VOIDmode) ? XEXP (p, 0) : NULL;
+  return GET_CODE (pat) == SET
+	 && arith_reg_dest (XEXP (pat, 0), SImode)
+	 && negt_reg_operand (XEXP (pat, 1), VOIDmode) ? XEXP (pat, 0) : NULL;
+
 }
 
 /* Given an insn and a reg number, tell whether the reg dies or is unused
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 240470)
+++ gcc/config/sh/sh.md	(working copy)
@@ -8524,6 +8524,24 @@
   [(set_attr "type" "arith") ;; poor approximation
(set_attr "length" "4")])
 
+(define_insn_and_split "*cset_zero"
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(if_then_else:SI (match_operand 1 "treg_set_expr_not_const01")
+			 (match_dup 0) (const_int 0)))
+   (clobber (reg:SI T_REG))]
+  "TARGET_SH1 && TARGET_ZDCBRANCH && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(if_then_else:SI (match_dup 1) (match_dup 0) (const_int 0)))]
+{
+  sh_treg_insns ti = sh_split_treg_set_expr (operands[1], curr_insn);
+  if (ti.remove_trailing_nott ())
+operands[1] = gen_rtx_EQ (SImode, get_t_reg_rtx (), const0_rtx);
+  else
+operands[1] = gen_rtx_EQ (SImode, get_t_reg_rtx (), const1_rtx);
+})
+
 (define_expand "cstoresf4"
   [(set (match_operand:SI 0 "register_operand")
 	(match_operator:SI 1 "ordered_comparison_operator"
Index: gcc/testsuite/gcc.target/sh/pr51244-11.c
===
--- gcc/testsuite/gcc.target/sh/pr51244-11.c	(revision 240470)
+++ gcc/testsuite/gcc.target/sh/pr51244-11.c	(working copy)
@@ -1,8 +1,11 @@
 /* Check that zero-displacement branches are used instead of branch-free
-   execution patterns.  */
+   execution patterns.
+   This is usually handled by the *cset_zero patterns.  */
 /* { dg-do compile }  */
-/* { dg-options "-O1 

Re: [PATCH v3] Optimize strchr to strlen

2016-09-23 Thread Oleg Endo
On Fri, 2016-09-23 at 14:07 +, Wilco Dijkstra wrote:
> After discussion (https://gcc.gnu.org/ml/gcc-patches/2016-09/msg00718
> .html)
> here is the latest version of the strchr patch.  This uses a gimple
> statement for
> the pointer addition so the gsi_prev always points at the strlen
> call.
> 
> Optimize strchr (s, 0) to s + strlen (s).  strchr (s, 0) appears a
> common
> idiom for finding the end of a string, however it is not a very
> efficient
> way of doing so.  Strlen is a much simpler operation which is
> significantly
> faster (eg. on x86 strlen is 50% faster for strings of 8 bytes and
> about
> twice as fast as strchr on strings of 1KB).
> 
> OK for commit?

Please add PR tree-optimization/61056 to the changelog so that it gets
linked in bugzilla.

Cheers,
Oleg


Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-13 Thread Oleg Endo
On Tue, 2016-09-13 at 12:36 +, Wilco Dijkstra wrote:
> Richard Biener  wrote:
> > 
> > On Wed, May 18, 2016 at 2:29 PM, Wilco Dijkstra  > .com> wrote:
> > > 
> > > Richard Biener wrote:
> > > 
> > > > 
> > > > Yeah ;)  I'm currently bootstrapping/testing the patch that
> > > > makes it possible to
> > > > write all this in match.pd.
> > > So what was the conclusion? Improving match.pd to be able to
> > > handle more cases
> > > like this seems like a nice thing.
> > I'm stuck with fallout and making this work requires some serious
> > thought.  Don't
> > hold your breath here :/
> > 
> > The restricted case of strchr (a, 0) -> strlen () can be made
> > working
> > more easily
> > but I didn't yet try to implement a restriction only allowing the
> > cases that would work.
> > 
> > Meanwhile the strlenopt pass would be an appropriate place to
> > handle
> > this transform
> > (well, if we now agree on its usefulness).
> I'd like to pick this up again so we can make GCC7 a bit less
> inefficient for this case.
> (original thread: https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00870
> .html)
> 
> We've seen several different proposals for where/how to do this
> simplification, why did you 
> say strlenopt is best? It would be an unconditional strchr (a, 0) ->
> a + strlen (a) rewrite,
> ie. completely unrelated to what strlenopt does. We do all the other
> simplifications based
> on constant arguments in builtins.c and gimple-fold.c, why is strchr
> (s, 0) different?
> 

BTW, there are two PRs for this: 61056 and 32650.  Please take them
into account when committing something for this issue.

Cheers,
Oleg



Re: [wwwdocs] superh.com has gone poker(?)

2016-08-22 Thread Oleg Endo
On Mon, 2016-08-22 at 14:27 +0200, Gerald Pfeifer wrote:
> It appears www.superh.com no longer has anything to do with
> processors,
> but poker.  Better remove this (old) link.

Yeah, I think I've removed one of those links from some other page a
while ago.  Also, note that SH5 support has been removed again recently
so uhm ... 

Cheers,
Oleg


Re: protected alloca class for malloc fallback

2016-08-11 Thread Oleg Endo
On Wed, 2016-08-10 at 21:31 -0400, Trevor Saunders wrote:

> 
> Well, I'd say the compromises made in std::string make it pretty
> terrible for general purpose use accept where perf and memory doesn't
> matter at all.
> 
> std::vector isn't very great size wise either, imho the size / 
> capacity fields would be much better put in the buffer than in the 
> struct itself, to save 2 * sizeof (void *) in sizeof std::vector.

There's nothing in the standard that says those fields have to go into
the vector struct/class itself and not into the data buffer.  It just
happens to be the way it's implemented in libstdc++.  E.g. libc++
implements some things in different ways to save a few bytes here and
there.

It looks more practical to put those fields into the container itself,
otherwise you'd have to do a nullptr check every time you access the
size field etc.  Although newer compilers optimize away the redundant
nullptr checks, older compilers (which GCC wants to be good with) might
not do it.  In any case, it seems to generate slightly bigger code, at
least in one instance (see attachment).

Anyway, normally there is more data in the vectors than there are
vectors around, so whether sizeof (vector) is 8 or 24 bytes doesn't
matter.

> 
> or just having your stack_string type convert to the normal string 
> type so that you can pass mutable references.

But that would imply copying, wouldn't it?

Cheers,
Oleg
#if 0
template 
class vector
{
public:
  unsigned int size (void) const { return m_buffer != nullptr ? m_buffer->size : 0; }
  bool empty (void) const { return size () == 0; }

  unsigned int capacity (void) const { return m_buffer != nullptr ? m_buffer->capacity : 0; }

  T* data (void) { return (T*)(m_buffer + 1); }
  const T* data (void) const { return (const T*)(m_buffer + 1); }

  T& operator [] (unsigned int i) { return data ()[i]; }
  const T& operator [] (unsigned int i) const { return data ()[i]; }

private:
  struct buffer_header
  {
unsigned int size;
unsigned int capacity;
  };

  buffer_header* m_buffer;
};


int foo (const vector& x)
{
  if (x.empty ())
return 0;

  int r = 0;
  for (unsigned int i = 0; i < x.size (); ++i)
r += x[i];

  return r;
}

/*
-O2 -m2a
	mov.l	@r4,r2
	tst	r2,r2
	bt	.L5
	mov.l	@r2,r1
	tst	r1,r1
	bt.s	.L5
	shll2	r1
	add	#-4,r1
	shlr2	r1
	add	#8,r2
	mov	#0,r0
	add	#1,r1
	.align 2
.L3:
	mov.l	@r2+,r3
	dt	r1
	bf.s	.L3
	add	r3,r0
	rts/n
	.align 1
.L5:
	rts
	mov	#0,r0
*/
#endif


#if 1

template 
class vector
{
public:
  unsigned int size (void) const { return m_size; }
  bool empty (void) const { return size () == 0; }

  unsigned int capacity (void) const { return m_capacity; }

  T* data (void) { return (T*)(m_buffer); }
  const T* data (void) const { return (const T*)(m_buffer); }

  T& operator [] (unsigned int i) { return data ()[i]; }
  const T& operator [] (unsigned int i) const { return data ()[i]; }

private:
  unsigned int m_size;
  unsigned int m_capacity;
  T* m_buffer;
};


int foo (const vector& x)
{
  if (x.empty ())
return 0;

  int r = 0;
  for (unsigned int i = 0; i < x.size (); ++i)
r += x[i];

  return r;
}

/*
-O2 -m2a
	mov.l	@r4,r1
	tst	r1,r1
	bt.s	.L7
	mov	#0,r0
	shll2	r1
	mov.l	@(8,r4),r2
	add	#-4,r1
	shlr2	r1
	add	#1,r1
	.align 2
.L3:
	mov.l	@r2+,r3
	dt	r1
	bf.s	.L3
	add	r3,r0
.L7:
	rts/n
*/

#endif



Re: protected alloca class for malloc fallback

2016-08-10 Thread Oleg Endo
On Tue, 2016-08-09 at 13:41 -0400, Trevor Saunders wrote:

> If what you want is the ability to put the buffer on the stack
> instead  of the heap then I think a stack_string class that
> interoperates with  your string class is the thing you want.

I'd avoid a separate stack_string class.  Instead use a linear
allocator with storage allocated on the stack, e.g.:

typedef std::basic_string 
>  I don't really see anything wrong with a string class being a really
>  fancy smart pointer that has a bunch of useful string stuff on it. 


That's what std::string basically is?


>  As for operator == I'd be fairly ok with that, other than it hiding 
> a O(N) operation in ==.

Yes, it makes it more difficult to carry out algorithm complexity
analysis using grep...


> Regretably necessary sure, but I'm not sure its funny.

Hence the quotes.

> The first big problem with using the stl is that the subset available 
> in C++98 isn't that great, you could maybe hack up libstdc++ so that 
> you can use newer types just without the bits that use newer language 
> features, but that would take some work.

Or just wait until people have agreed to switch to C++11 or C++14.  I
don't think in practice anybody uses an C++11-incapable GCC to build a
newer GCC these days.

> The other big problem is that the stl is often too general, and tries 
> to be too simple.  std::string is actually a good example of both of 
> those problems.  There's really no reason it should use size_t 
> instead of uint32_t for the string length / capacity.

Yes, some of the things in the STL are collections of compromises for
general-purpose usage.  If you're doing something extra fancy, most
likely you can outperform the generic STL implementation.  But very
often it's actually not needed.


>   It would also be a lot better if it had separate types for strings 
> where you want an internal buffer or don't.

Using custom allocators is one way.  For example ...

template  struct linear_allocator_with_buffer;

template 
using stack_string =
  std::basic_string;

But then ...

stack_string<32> a = "a";
stack_string<64> b = "b";

b += a;

... will not work because they are different types.

One drawback of using custom allocators for that kind of thing is being
unable to pass the above stack_string to a function that takes a "const
std::string&" because they differ in template parameters.  But that's
where std::string_view comes in.


Cheers,
Oleg


Re: protected alloca class for malloc fallback

2016-08-09 Thread Oleg Endo
On Mon, 2016-08-08 at 13:39 -0400, Trevor Saunders wrote:


> First sizeof std::string is 32 on x86_64, a char *, a size_t for the
> length, and a 16 byte union of a size_t for allocated size and a 16 
> byte buffer for short strings.  I suppose some of this is required by 
> the C++ standard,

I recommend checking what others have figured regarding that matter
https://github.com/elliotgoodrich/SSO-23


>  but I doubt we really need to care about strings longer than 2^32 in
> gcc.  If we put the length and allocated size in the buffer,
> and have a separate class for stack allocated buffers I think we can 
> have a string that is just sizeof void *.

This idea has one flaw in that it does not allow having objects by
value.  It essentially *requires* one to *always* allocate them on the
heap via new/delete.  Being able to store objects by value is useful in
many situations.  So if you do the above, the next logical step that
will follow is a smart-pointer-like wrapper that allows value
semantics.  Because eventually somebody will want that operator == or
operator < e.g. for associative containers.

> 
> Second it would be useful performance wise to have a std::string_view
> type class, but that is c++14 or 17? only so we'd need to import it 
> into gcc or something.

http://en.cppreference.com/w/cpp/experimental/basic_string_view
http://en.cppreference.com/w/cpp/string/basic_string_view

It's "funny" that GCC contains a C++ stdlib implementation and so
little is actually used by GCC itself.

Cheers,
Oleg



Re: protected alloca class for malloc fallback

2016-08-05 Thread Oleg Endo
On Fri, 2016-08-05 at 19:55 +0200, Richard Biener wrote:

> 
> Please don't use std::string.  For string building you can use
> obstacks.
> 

Just out of curiosity ... why?  I remember there was some discussion
about it, what was the conclusion?  Is that now a general rule or does
it depend on the context where strings are used?

Cheers,
Oleg


Re: [PATCH] Add constexpr to iterator adaptors, array and range access

2016-07-15 Thread Oleg Endo
On Sat, 2016-07-16 at 00:09 +0100, Jonathan Wakely wrote:
> This patch implements http://wg21.link/p0031 which adds constexpr to
> most operations on std::reverse_iterator, std::move_iterator,
> std::array, as well as std::advance, std::distance, and the
> range-access functions std::begin, std::rbegin etc.
> 
> Strictly speaking, those functions should only be constexpr in C++17
> and not before, but this patch makes them constexpr whenever
> possible.
> That means the changes are fully implemented for C++14 (and the
> feature-test macro is defined) but only partially implemented for
> C++11, because some of the functions can't be constexpr in C++11.
> 
> My thinking is that if the committee has decided that these functions
> *should* be constexpr, and changed them for C++17, then it doesn't
> serve users to make them non-constexpr in C++11 and C++14 just
> because
> the standard says so.
> 
> How do other people feel about that?
> 
The alternative would be to define a new _GLIBCXX17_CONSTEXPR macro
> and use it in all these places, so they're only constexpr in C++17
> (and probably for -std=gnu++14 too, but not -std=c++14).
> 
> How strict do we want to be about obeying the "implementors can't add
> constexpr" rule in these cases?

Other std libs are even more ignorant regarding this and do stuff like
defining several C++11 features and functionalities even if C++11 is
not enabled, which can be annoying at times.

One thing that could happen with those extra constexpr, is having
people accidentally writing non-C++11/14-compliant code on a newer
C++17 toolchain, although they actually wanted to write C++11/14
compliant code.

GCC itself, with its conservative toolchain requirements, is probably a
good example.  If GCC's compiler requirement is bumped from C++98/03 to
C++11 some day, most GCC developers are probably going to use a pretty
recent GCC version for GCC development.  As a consequence, silent non
-compliant constexpr related constructs might slip in over and over
again simply because people don't notice.  So everyone would need to
keep some older C++11-only toolchain version just for testing the
build.  I can imagine this to be a source of frustration.  If users
want to write C++11 or C++14 code, the toolchain should assist them in
doing so as much as possible.

I'd vote for the _GLIBCXX17_CONSTEXPR macro.

Cheers,
Oleg


[SH][committed] Fix build after changes for PR 52171

2016-06-04 Thread Oleg Endo
Hi,

The recent changes for PR 52171 didn't update the users of the renamed
function 'move_by_pieces_ninsns'.  The attached patch fixes this.

Tested with "make all".  Committed as r237090.

Cheers,
Oleg

gcc/ChangeLog:
PR tree-optimization/52171
* config/sh/sh.c (sh_use_by_pieces_infrastructure_p): Use
by_pieces_ninsns instead of move_by_pieces_ninsns.Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 237089)
+++ gcc/config/sh/sh.c	(working copy)
@@ -12506,11 +12506,11 @@
   switch (op)
 {
   case MOVE_BY_PIECES:
-	return move_by_pieces_ninsns (size, align, MOVE_MAX_PIECES + 1)
+	return by_pieces_ninsns (size, align, MOVE_MAX_PIECES + 1, op)
 	  < (!speed_p ? 2 : (align >= 32) ? 16 : 2);
   case STORE_BY_PIECES:
   case SET_BY_PIECES:
-	return move_by_pieces_ninsns (size, align, STORE_MAX_PIECES + 1)
+	return by_pieces_ninsns (size, align, STORE_MAX_PIECES + 1, op)
 	  < (!speed_p ? 2 : (align >= 32) ? 16 : 2);
   default:
 	return default_use_by_pieces_infrastructure_p (size, align,


[SH][committed] Avoid potential slient wrong-code with reg+reg addr. modes

2016-06-04 Thread Oleg Endo
Hi,

The attached patch removes the hardcoded "r0" when printing reg+reg
addressing mode mems on SH.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r237088.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.c (sh_print_operand_address): Don't use hardcoded 'r0'
for reg+reg addressing mode.diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 2bd917a..74327aa 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -1038,8 +1038,16 @@ sh_print_operand_address (FILE *stream, machine_mode /*mode*/, rtx x)
 	  int base_num = true_regnum (base);
 	  int index_num = true_regnum (index);
 
-	  fprintf (stream, "@(r0,%s)",
-		   reg_names[MAX (base_num, index_num)]);
+	  /* If base or index is R0, make sure that it comes first.
+		 Usually one of them will be R0, but the order might be wrong.
+		 If neither base nor index are R0 it's an error and we just
+		 pass it on to the assembler.  This avoids silent wrong code
+		 bugs.  */
+	  if (base_num == 0 && index_num != 0)
+		std::swap (base_num, index_num);
+
+	  fprintf (stream, "@(%s,%s)", reg_names[index_num],
+	   reg_names[base_num]);
 	  break;
 	}
 


[SH][committed] Use default ASM_OUTPUT_SYMBOL_REF

2016-05-31 Thread Oleg Endo
Hi,

Since the SH5 stuff is gone, there's no need to override the
 ASM_OUTPUT_SYMBOL_REF target macro anymore.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236930.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.h (ASM_OUTPUT_SYMBOL_REF): Remove macro and use the
default implementation.diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index d724bd2..0403616 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1718,15 +1718,6 @@ extern bool current_function_interrupt;
? (24) \
: (unsigned) -1)
 
-/* This is how to output a reference to a symbol_ref.  On SH5,
-   references to non-code symbols must be preceded by `datalabel'.  */
-#define ASM_OUTPUT_SYMBOL_REF(FILE,SYM)			\
-  do 			\
-{			\
-  assemble_name ((FILE), XSTR ((SYM), 0));		\
-}			\
-  while (0)
-
 /* This is how to output an assembler line
that says to advance the location counter
to a multiple of 2**LOG bytes.  */


[SH][committeð] Remove SH5 target regs leftovers

2016-05-31 Thread Oleg Endo
Hi,

The attached patch removes the SH5 target register leftovers.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236928.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/constraints.md (b): Remove constraint.
* config/sh/predicates.md (arith_reg_operand): Remove TARGET_REGISTER_P.
* config/sh/sh-modes.def (PDI): Remove.
* config/sh/sh.c (sh_target_reg_class,
sh_optimize_target_register_callee_saved): Remove functions.
(sh_option_override): Don't set MASK_SAVE_ALL_TARGET_REGS.
(sh_expand_epilogue): Update comment.
(sh_hard_regno_mode_ok, sh_register_move_cost, calc_live_regs,
sh_secondary_reload): Remove TARGET_REGS related code.
* config/sh/sh.h (FIRST_TARGET_REG, LAST_TARGET_REG,
TARGET_REGISTER_P): Remove macros.
(SH_DBX_REGISTER_NUMBER, REG_ALLOC_ORDER): Remove target regs.
* config/sh/sh.md (PR_MEDIA_REG, T_MEDIA_REG, FR23_REG, TR0_REG,
TR1_REG, TR2_REG): Remove constants.
* config/sh/sh.opt (SAVE_ALL_TARGET_REGS): Remove.diff --git a/gcc/config/sh/constraints.md b/gcc/config/sh/constraints.md
index 644a0f0..c3e9d55 100644
--- a/gcc/config/sh/constraints.md
+++ b/gcc/config/sh/constraints.md
@@ -62,9 +62,6 @@
 (define_register_constraint "a" "ALL_REGS"
   "@internal")
 
-(define_register_constraint "b" "TARGET_REGS"
-  "Branch target registers.")
-
 (define_register_constraint "c" "FPSCR_REGS"
   "Floating-point status register.")
 
diff --git a/gcc/config/sh/predicates.md b/gcc/config/sh/predicates.md
index 4de90af..4b93c6d 100644
--- a/gcc/config/sh/predicates.md
+++ b/gcc/config/sh/predicates.md
@@ -34,7 +34,6 @@
 	return 1;
 
   return (regno != T_REG && regno != PR_REG
-	  && ! TARGET_REGISTER_P (regno)
 	  && regno != FPUL_REG && regno != FPSCR_REG
 	  && regno != MACH_REG && regno != MACL_REG);
 }
diff --git a/gcc/config/sh/sh-modes.def b/gcc/config/sh/sh-modes.def
index cab2011a..6db9943 100644
--- a/gcc/config/sh/sh-modes.def
+++ b/gcc/config/sh/sh-modes.def
@@ -17,9 +17,6 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-/* PDI mode is used to represent a function address in a target register.  */
-PARTIAL_INT_MODE (DI, 64, PDI);
-
 /* Vector modes.  */
 VECTOR_MODE  (INT, QI, 2);/* V2QI */
 VECTOR_MODES (INT, 4);/*V4QI V2HI */
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index a36b098..2bd917a 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -234,8 +234,6 @@ static int sh_variable_issue (FILE *, int, rtx_insn *, int);
 static bool sh_function_ok_for_sibcall (tree, tree);
 
 static bool sh_can_follow_jump (const rtx_insn *, const rtx_insn *);
-static reg_class_t sh_target_reg_class (void);
-static bool sh_optimize_target_register_callee_saved (bool);
 static bool sh_ms_bitfield_layout_p (const_tree);
 
 static void sh_init_builtins (void);
@@ -465,11 +463,6 @@ static const struct attribute_spec sh_attribute_table[] =
 
 #undef TARGET_CAN_FOLLOW_JUMP
 #define TARGET_CAN_FOLLOW_JUMP sh_can_follow_jump
-#undef TARGET_BRANCH_TARGET_REGISTER_CLASS
-#define TARGET_BRANCH_TARGET_REGISTER_CLASS sh_target_reg_class
-#undef TARGET_BRANCH_TARGET_REGISTER_CALLEE_SAVED
-#define TARGET_BRANCH_TARGET_REGISTER_CALLEE_SAVED \
-  sh_optimize_target_register_callee_saved
 
 #undef TARGET_MS_BITFIELD_LAYOUT_P
 #define TARGET_MS_BITFIELD_LAYOUT_P sh_ms_bitfield_layout_p
@@ -800,8 +793,6 @@ sh_option_override (void)
   int regno;
 
   SUBTARGET_OVERRIDE_OPTIONS;
-  if (optimize > 1 && !optimize_size)
-target_flags |= MASK_SAVE_ALL_TARGET_REGS;
 
   sh_cpu = PROCESSOR_SH1;
   assembler_dialect = 0;
@@ -7037,30 +7028,6 @@ calc_live_regs (HARD_REG_SET *live_regs_mask)
   if (nosave_low_regs && reg == R8_REG)
 	break;
 }
-  /* If we have a target register optimization pass after prologue / epilogue
- threading, we need to assume all target registers will be live even if
- they aren't now.  */
-  if (flag_branch_target_load_optimize2 && TARGET_SAVE_ALL_TARGET_REGS)
-for (reg = LAST_TARGET_REG; reg >= FIRST_TARGET_REG; reg--)
-  if ((! call_really_used_regs[reg] || interrupt_handler)
-	  && ! TEST_HARD_REG_BIT (*live_regs_mask, reg))
-	{
-	  SET_HARD_REG_BIT (*live_regs_mask, reg);
-	  count += GET_MODE_SIZE (REGISTER_NATURAL_MODE (reg));
-	}
-  /* If this is an interrupt handler, we don't have any call-clobbered
- registers we can conveniently use for target register save/restore.
- Make sure we save at least one general purpose register when we need
- to save target registers.  */
-  if (interrupt_handler
-  && hard_reg_set_intersect_p (*live_regs_mask,
-   reg_class_contents[TARGET_REGS])
-  && ! hard_reg_set_intersect_p (*live_regs_mask,
- 

[SH][committed] Simplify DImode add, sub, neg patterns

2016-05-31 Thread Oleg Endo
Hi,

The attached patch simplifies some DImode patterns on SH.  The
force_reg in the expand patterns can also be expressed by using the
appropriate predicate, which eliminates the need for the expand
patterns altogether.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236927.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.md (adddi3, subdi3, negdi2, abs2): Remove
define_expand patterns.
(adddi3_compact): Rename to adddi3.
(subdi3_compact): Rename to subdi3.
(*negdi2): Rename to negdi2.
(*abs2): Rename to abs2.diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 406721d..30948ca 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -1535,18 +1535,7 @@
 ;; Addition instructions
 ;; -
 
-(define_expand "adddi3"
-  [(set (match_operand:DI 0 "arith_reg_operand")
-	(plus:DI (match_operand:DI 1 "arith_reg_operand")
-		 (match_operand:DI 2 "arith_operand")))]
-  ""
-{
-  operands[2] = force_reg (DImode, operands[2]);
-  emit_insn (gen_adddi3_compact (operands[0], operands[1], operands[2]));
-  DONE;
-})
-
-(define_insn_and_split "adddi3_compact"
+(define_insn_and_split "adddi3"
   [(set (match_operand:DI 0 "arith_reg_dest")
 	(plus:DI (match_operand:DI 1 "arith_reg_operand")
 		 (match_operand:DI 2 "arith_reg_operand")))
@@ -1938,21 +1927,10 @@
 ;; Subtraction instructions
 ;; -
 
-(define_expand "subdi3"
-  [(set (match_operand:DI 0 "arith_reg_operand" "")
-	(minus:DI (match_operand:DI 1 "arith_reg_or_0_operand" "")
-		  (match_operand:DI 2 "arith_reg_operand" "")))]
-  ""
-{
-  operands[1] = force_reg (DImode, operands[1]);
-  emit_insn (gen_subdi3_compact (operands[0], operands[1], operands[2]));
-  DONE;
-})
-
-(define_insn_and_split "subdi3_compact"
+(define_insn_and_split "subdi3"
   [(set (match_operand:DI 0 "arith_reg_dest")
 	(minus:DI (match_operand:DI 1 "arith_reg_operand")
-		 (match_operand:DI 2 "arith_reg_operand")))
+		  (match_operand:DI 2 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
   "TARGET_SH1"
   "#"
@@ -4393,13 +4371,7 @@
 
 ;; Don't split into individual negc insns immediately so that neg:DI (abs:DI)
 ;; can be combined.
-(define_expand "negdi2"
-  [(parallel [(set (match_operand:DI 0 "arith_reg_dest")
-		   (neg:DI (match_operand:DI 1 "arith_reg_operand")))
-	  (clobber (reg:SI T_REG))])]
-  "TARGET_SH1")
-
-(define_insn_and_split "*negdi2"
+(define_insn_and_split "negdi2"
   [(set (match_operand:DI 0 "arith_reg_dest")
 	(neg:DI (match_operand:DI 1 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
@@ -4480,13 +4452,7 @@
 }
   [(set_attr "type" "arith")])
 
-(define_expand "abs2"
-  [(parallel [(set (match_operand:SIDI 0 "arith_reg_dest")
-		   (abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
-	  (clobber (reg:SI T_REG))])]
-  "TARGET_SH1")
-
-(define_insn_and_split "*abs2"
+(define_insn_and_split "abs2"
   [(set (match_operand:SIDI 0 "arith_reg_dest")
   	(abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
(clobber (reg:SI T_REG))]


Re: [RX] Add support for atomic operations

2016-05-31 Thread Oleg Endo
On Tue, 2016-05-31 at 14:17 +0100, Nick Clifton wrote:

> > Sorry, but my original patch was buggy.  There are two problems:
> 
> Thanks for your diligence in checking the patch.

Just eating my own dogfood here ... :)

> Approved - please apply.

Committed as r236926.

Cheers,
Oleg


  1   2   3   4   5   6   7   8   >