[Bug target/54089] [SH] Refactor shift patterns

2023-10-14 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #105 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #104)
> I've been thinking about something. I suspect that this patch could take
> work away from other patches. I'm sorry, I don't know how to express myself
> properly. I mean there's several patches that corrects shift patterns and
> 'tst' instruction generation (most of them are written by you, by the way).
> I suspect that some of them might not run anymore because this patch looks
> more general and should cover more cases, including yet unknown cases, I
> hope. And, in the end, dead code may appear because of it. I hope I was able
> to make my point clearly.

Yes, I understand what you're saying.  As other parts of the compiler evolve,
the RTL input that the backend code has to work with changes.  It's a normal
thing that happens during the course of development.  Some patterns might stop
working (especially those combine patterns are prone to that).  And sometimes
things magically start working because something got fixed somewhere else.

I've tried to add SH specific test cases to try and keep it in check.  Ideally
we'd have to go through all of the specific SH quirks in the backend
periodically, try to remove them one by one and run tests to see if the
patterns are still working/needed or whether they can be removed.

Let me know if you have more test cases (that work or don't work).

[Bug target/54089] [SH] Refactor shift patterns

2023-10-14 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #104 from Alexander Klepikov  
---
(In reply to Oleg Endo from comment #103)
> Sorry, I've been busy with other things.  I think your patch is OK, but I
> wanted to test it in more detail before committing it.

Oh, it's OK. By the way, your contribution is not less than mine.

I've been thinking about something. I suspect that this patch could take work
away from other patches. I'm sorry, I don't know how to express myself
properly. I mean there's several patches that corrects shift patterns and 'tst'
instruction generation (most of them are written by you, by the way). I suspect
that some of them might not run anymore because this patch looks more general
and should cover more cases, including yet unknown cases, I hope. And, in the
end, dead code may appear because of it. I hope I was able to make my point
clearly.

[Bug target/54089] [SH] Refactor shift patterns

2023-10-12 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #103 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #102)
> Created attachment 55543 [details]
> Arithmetic right shift late expanding v2
> 
> Here's the patch. I hope I did not miss anything.
> 

Sorry, I've been busy with other things.  I think your patch is OK, but I
wanted to test it in more detail before committing it.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-14 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Alexander Klepikov  changed:

   What|Removed |Added

  Attachment #55503|0   |1
is obsolete||
  Attachment #55513|0   |1
is obsolete||

--- Comment #102 from Alexander Klepikov  
---
Created attachment 55543
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543=edit
Arithmetic right shift late expanding v2

Here's the patch. I hope I did not miss anything.

Now considering regexp. I remade it using 'check-function-bodies' command and
now it looks less confusing. I also found that in this testcase right shift
expands to 'shad' instructions early on platforms that have dynamic shift
support, so I deleted checks for those CPUs.

I don't like that 'check-function-bodies' ignores asm labels but it's better
than nothing.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-13 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #101 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #100)
> Created attachment 55513 [details]
> Arithmetic right shift late expanding
> 
> (In reply to Oleg Endo from comment #99)
> > Meanwhile, here's my iteration on your patch.
> 
> Thank you! I did all checks I did before, added testcases, and edited to fit
> the code style.

Looks OK.  Just 3 things:

> +++ gcc-13.1.0/gcc/testsuite/gcc.target/sh/pr54089-11.c   2023-07-07 
> 08:56:41.212345982 +0300
> @@ -0,0 +1,37 @@
> +/* Check that 'tst #64,r0' genrated.  */
> +/* { dg-do compile }  */
> +/* { dg-options "-O2" }  */

Please rename this test to pr49263... to have all tst #imm,r0 related tests in
one place.

Also:
  - 'genrated' -> 'generated'
  - space before opening paren of function args
  - 2 spaces indention
  - similarly, check code style of other added tests

> --- gcc-13.1.0.orig/gcc/testsuite/gcc.target/sh/pr54089-12.c  1970-01-01 
> 03:00:00.0 +0300

Can you try out Segher's suggestion in #c88 to make the regex look less
cluttered?

[Bug target/54089] [SH] Refactor shift patterns

2023-07-10 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #100 from Alexander Klepikov  
---
Created attachment 55513
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55513=edit
Arithmetic right shift late expanding

(In reply to Oleg Endo from comment #99)
> Meanwhile, here's my iteration on your patch.

Thank you! I did all checks I did before, added testcases, and edited to fit
the code style.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-09 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Oleg Endo  changed:

   What|Removed |Added

  Attachment #28207|0   |1
is obsolete||
  Attachment #28633|0   |1
is obsolete||

--- Comment #99 from Oleg Endo  ---
Created attachment 55506
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55506=edit
proposed patch

(In reply to Alexander Klepikov from comment #98)
> Created attachment 55503 [details]
> Testcase for SH specific loop optimization pass
> 

Thanks.  I'll check it out.
Meanwhile, here's my iteration on your patch.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-08 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Alexander Klepikov  changed:

   What|Removed |Added

  Attachment #55382|0   |1
is obsolete||

--- Comment #98 from Alexander Klepikov  
---
Created attachment 55503
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55503=edit
Testcase for SH specific loop optimization pass

I modified testcase for hoisting tests. It contains two different dg-final
commands at once. First uses 'scan-assembler' command and second uses
'check-function-bodies'.

The difference is that 'scan-assembler' uses "plain" regexp and is more
powerfull but more complicated to understand. 'check-function-bodies' is
simpler but less powerfull and requires C++, if I'm not mistaken.

They both work, but only one is needed because they do the same. I hope this
will help.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #97 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #96)
> 
> Not really. 'expand_ashiftrt' could emit
> 
> mov  #imm, r1
> shad r1,   r0
> 
> and 'mov' instruction would be loop invariant if there's a loop. It looks
> like 'ashrsi3_libcall_expanded' is a bad name. I think name
> 'ashrsi3_n_call_expanded' would be more appropriate.

Ah, yes, indeed.  I'm cleaning up your patch and will rename it.

> 
> > However, there is one case from  your previous posts in PR 49263:
> > 
> > int f_rshift(char v){
> > return v >> S;
> > }
> > 
> > This will not work on SH2 and expand to a libcall.
> 
> I think you mean SH2A, because the same is going on with SH4.
> 
> >  On SH4 the combine pass
> > converts it into:
> > 
> > Successfully matched this instruction:
> > (set (reg:SI 166)
> > (ashiftrt:SI (reg/v:SI 164 [ v ])
> > (const_int 31 [0x1f])))
> > 
> > But it doesn't even try to do so with the 'ashrsi3_n_call' pattern.  Not
> > sure why ...
> 
> Well, the same thing is going on when using vanilla GCC. It looks like it's
> happening due to char sign extension. Then instruction is catched by
> 'ashrsi2_31' pattern. In another words, it looks to me like an optimization.

It can be fixed by adding another pattern.  I'll make another patch for that
later.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-07 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #96 from Alexander Klepikov  
---
(In reply to Oleg Endo from comment #95)
> The infinite loop is in splitting of the 'ashrsi3_n_call' pattern with the
> constant 1.  This is because 'ashrsi3_n_call' match overlaps with the 'shar'
> pattern.

Yes, I confirm that. 'operands[2] != const1_rtx' works well, thank you!

> Another point is that you had the
> 'cfun->machine->ashrsi3_libcall_expanded++;' in the wrong place.  It was
> counting up even if it wouldn't have emitted the libcall.

Not really. 'expand_ashiftrt' could emit

mov  #imm, r1
shad r1,   r0

and 'mov' instruction would be loop invariant if there's a loop. It looks like
'ashrsi3_libcall_expanded' is a bad name. I think name
'ashrsi3_n_call_expanded' would be more appropriate.

> However, there is one case from  your previous posts in PR 49263:
> 
> int f_rshift(char v){
> return v >> S;
> }
> 
> This will not work on SH2 and expand to a libcall.

I think you mean SH2A, because the same is going on with SH4.

>  On SH4 the combine pass
> converts it into:
> 
> Successfully matched this instruction:
> (set (reg:SI 166)
> (ashiftrt:SI (reg/v:SI 164 [ v ])
> (const_int 31 [0x1f])))
> 
> But it doesn't even try to do so with the 'ashrsi3_n_call' pattern.  Not
> sure why ...

Well, the same thing is going on when using vanilla GCC. It looks like it's
happening due to char sign extension. Then instruction is catched by
'ashrsi2_31' pattern. In another words, it looks to me like an optimization.

[Bug target/54089] [SH] Refactor shift patterns

2023-07-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #95 from Oleg Endo  ---
(In reply to Oleg Endo from comment #93)
> (In reply to Alexander Klepikov from comment #92)
> > I remembered why I used two different insns - first to eliminate infinite
> > loop with help of marking insn with attribute, and second because I could
> > not set attribute when emitting insn from C code. Whe have 'get_attr_*'
> > functions but we have not 'set_attr_*'.
> 
> Yes, I thought so.  I'll give it a try myself with your patch ... please
> hold on.

Finally could have a look at it, sorry for the delay.

The infinite loop is in splitting of the 'ashrsi3_n_call' pattern with the
constant 1.  This is because 'ashrsi3_n_call' match overlaps with the 'shar'
pattern.

One quick fix is to put the condition into the first string.  A nicer way would
be to use a predicate which would constrain the operand[2] to "not 1".  But
it's the first and only use so quick version is OK.

In addition, that pattern (not only the split condition) should be matched only
before register allocation, so the 'can_create_pseudo_p' check is moved.

Another point is that you had the 'cfun->machine->ashrsi3_libcall_expanded++;'
in the wrong place.  It was counting up even if it wouldn't have emitted the
libcall.

This seems to work:

(define_insn_and_split "ashrsi3_n_call"
  [(set (match_operand:SI 0 "arith_reg_dest")
(ashiftrt:SI (match_operand:SI 1 "arith_reg_operand")
 (match_operand:SI 2 "const_int_operand")))
(clobber (reg:SI T_REG))]
  "TARGET_SH1 && can_create_pseudo_p () && operands[2] != const1_rtx"
  "#"
  "&& 1"
  [(const_int 0)]
{
  if (expand_ashiftrt (operands))
DONE;

  if (dump_file)
fprintf(dump_file, "ashrsi3_n_call: Expanding ashrsi3 libcall\n");

  cfun->machine->ashrsi3_libcall_expanded++;

  int value = INTVAL (operands[2]) & 31;

  char func[18];
  sprintf (func, "__ashiftrt_r4_%d", value);

  emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);

  rtx wrk = gen_reg_rtx (Pmode);
  rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
  emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));
  emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
  DONE;
})


However, there is one case from  your previous posts in PR 49263:

int f_rshift(char v){
return v >> S;
}

This will not work on SH2 and expand to a libcall.  On SH4 the combine pass
converts it into:

Successfully matched this instruction:
(set (reg:SI 166)
(ashiftrt:SI (reg/v:SI 164 [ v ])
(const_int 31 [0x1f])))

But it doesn't even try to do so with the 'ashrsi3_n_call' pattern.  Not sure
why ...

[Bug target/54089] [SH] Refactor shift patterns

2023-06-23 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #94 from Segher Boessenkool  ---
(In reply to Alexander Klepikov from comment #92)
> I remembered why I used two different insns - first to eliminate infinite
> loop with help of marking insn with attribute, and second because I could
> not set attribute when emitting insn from C code. Whe have 'get_attr_*'
> functions but we have not 'set_attr_*'.

An attribute is part of the instruction *definition*, the define_insn; it isn't
a property you put on one instance of it.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-23 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #93 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #92)
> I remembered why I used two different insns - first to eliminate infinite
> loop with help of marking insn with attribute, and second because I could
> not set attribute when emitting insn from C code. Whe have 'get_attr_*'
> functions but we have not 'set_attr_*'.

Yes, I thought so.  I'll give it a try myself with your patch ... please hold
on.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-23 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #92 from Alexander Klepikov  
---
I remembered why I used two different insns - first to eliminate infinite loop
with help of marking insn with attribute, and second because I could not set
attribute when emitting insn from C code. Whe have 'get_attr_*' functions but
we have not 'set_attr_*'.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-22 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Alexander Klepikov  changed:

   What|Removed |Added

  Attachment #55367|0   |1
is obsolete||

--- Comment #91 from Alexander Klepikov  
---
Created attachment 55382
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55382=edit
Draft libcall/sh_loop patch, causes infinite loop

Please patch with '-p2'.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #90 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #89)
> Here's what I did
> ...

Can you attach it here as a patch please?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-22 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #89 from Alexander Klepikov  
---
Here's what I did

sh.md:

(define_expand "ashrsi3"
  [(parallel [(set (match_operand:SI 0 "arith_reg_dest" "")
   (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand" "")
(match_operand:SI 2 "nonmemory_operand" "")))
  (clobber (reg:SI T_REG))])]
  ""
{
  if (expand_ashiftrt (operands))
DONE;
  if (!CONST_INT_P (operands[2]))
FAIL;
  int value = INTVAL (operands[2]) & 31;
  emit_insn (gen_ashrsi3_n_call (operands[0], operands[1], GEN_INT(value)));
  DONE;
})

(define_insn_and_split "ashrsi3_n_call"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
(ashiftrt:SI (match_operand:SI 1 "arith_reg_operand" "0")
(match_operand:SI 2 "const_int_operand")))
(clobber (reg:SI T_REG))]
  "TARGET_SH1"
  "#"
  "&& can_create_pseudo_p ()"
  [(const_int 0)]
{
char func[18];
int value;
rtx wrk;

cfun->machine->ashrsi3_libcall_expanded++;

if (expand_ashiftrt (operands))
  DONE;
if (!CONST_INT_P (operands[2]))
  FAIL;
value = INTVAL (operands[2]) & 31;

if (dump_file)
  fprintf(dump_file, "ashrsi3_n_call: Expanding ashrsi3 libcall\n");

wrk = gen_reg_rtx (Pmode);
/* Load the value into an arg reg and call a helper.  */
emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
sprintf (func, "__ashiftrt_r4_%d", value);
rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));
emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
DONE;
})


sh.cc:

bool
expand_ashiftrt (rtx *operands)
{
  rtx wrk;
  int value;

  if (TARGET_DYNSHIFT)
{
  if (!CONST_INT_P (operands[2]))
{
  rtx count = copy_to_mode_reg (SImode, operands[2]);
  emit_insn (gen_negsi2 (count, count));
  emit_insn (gen_ashrsi3_d (operands[0], operands[1], count));
  return true;
}
  else if (ashiftrt_insns[INTVAL (operands[2]) & 31]
   > 1 + SH_DYNAMIC_SHIFT_COST)
{
  rtx count
= force_reg (SImode, GEN_INT (- (INTVAL (operands[2]) & 31)));
  emit_insn (gen_ashrsi3_d (operands[0], operands[1], count));
  return true;
}
}
  if (!CONST_INT_P (operands[2]))
return false;

  value = INTVAL (operands[2]) & 31;

  if (value == 31)
{
  /* If we are called from abs expansion, arrange things so that we
 we can use a single MT instruction that doesn't clobber the source,
 if LICM can hoist out the load of the constant zero.  */
  if (currently_expanding_to_rtl)
{
  emit_insn (gen_cmpgtsi_t (force_reg (SImode, CONST0_RTX (SImode)),
operands[1]));
  emit_insn (gen_mov_neg_si_t (operands[0], get_t_reg_rtx ()));
  return true;
}
  emit_insn (gen_ashrsi2_31 (operands[0], operands[1]));
  return true;
}
  else if (value >= 16 && value <= 19)
{
  wrk = gen_reg_rtx (SImode);
  emit_insn (gen_ashrsi2_16 (wrk, operands[1]));
  value -= 16;
  while (value--)
gen_ashift (ASHIFTRT, 1, wrk);
  emit_move_insn (operands[0], wrk);
  return true;
}
  /* Expand a short sequence inline, longer call a magic routine.  */
  else if (value <= 5)
{
  wrk = gen_reg_rtx (SImode);
  emit_move_insn (wrk, operands[1]);
  while (value--)
gen_ashift (ASHIFTRT, 1, wrk);
  emit_move_insn (operands[0], wrk);
  return true;
}
  return false;
}


Now GCC falls into infinite loop when compiling this:

int f_short_rshift_signed(char v){
return v >> 5;
}

It looks like it splits and then combines again.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #88 from Segher Boessenkool  ---
(In reply to Oleg Endo from comment #85)
> > +/* { dg-final { scan-assembler 
> > "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:"
> >  { target { ! has_dyn_shift } } } }  */
> 
> Can you try to somehow write this in a simpler way?  Maybe omit some of the
> register number matches, as they don't matter etc.

Do not use double-quoted strings unless you need interpolation?  If you use {}
around the string you do not need to backslash-quote (and double-quote) so much
at all.

[0-9] is \d

whitespace is \s

See the Tcl re_syntax manual page :-)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-21 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #87 from Segher Boessenkool  ---
(In reply to Oleg Endo from comment #53)
> (In reply to Segher Boessenkool from comment #52)
> > There is TARGET_LEGITIMATE_COMBINED_INSN though, which is a workaround for 
> > if
> > you really do not want the instruction combiner to create particular
> > instruction patterns (but it does nothing to prevent other parts of the
> > compiler from doing the same!)
> 
> Thanks for pointing it out.  I knew I missed something recent ;)

g:78e4f1ad4e48, from 2012?  Well, fairly recent, okay :-)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-21 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #86 from Alexander Klepikov  
---
(In reply to Oleg Endo from comment #85)
>For now, can you try to run the style check script on your changes?

Thank you, that's what I've been missing!

> > bool
> > expand_ashiftrt (rtx *operands)
> > {
> >   int value = INTVAL (operands[2]) & 31;
> ^^
> 
> Missing check for 'const_int' operand.  See other uses of 'CONST_INT_P'.

I was completely sure that condition '(match_operand:SI 2 "const_int_operand")'
in define_insn_and_split guarantees that 'CONST_INT_P (operands[2]) == true'.

> I think the 'define_insn "ashrsi3_libcall_collapsed"' and
> 'define_insn_and_split "ashrsi3_libcall_expand"' can be merged into a single
> pattern 'define_insn_and_split "ashrsi3_n_call" and the function
> 'expand_ashrsi3_libcall' (the tail of the original 'expand_ashiftrt') can be
> just put into the splitter code in the new "ashrsi3_n_call".
> 
> The splitter condition should include "&& can_create_pseudo_p ()" to make
> sure it's ran before register allocation.
> 
> I think this will reduce the change set a bit and make the intention of the
> patch a bit clearer.
> 

Yes, it looks like it works in general. But first tests show that all constant
dynamic shifts expand to library call for all targets, even for those with
dynshift instructions. That's because 'ashrsi3_n_call' and 'ashrsi3' should
check whether right shift could be expanded to individual shifts. They both
should execute the code that I separated into 'expand_ashiftrt_individual'
function to expand to individual shifts properly.

> > +/* { dg-final { scan-assembler 
> > "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:"
> >  { target { ! has_dyn_shift } } } }  */
> 
> Can you try to somehow write this in a simpler way?  Maybe omit some of the
> register number matches, as they don't matter etc.

I took a note, I'll deal with it later.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-20 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #85 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #83)
> Created attachment 55367 [details]
> Collapsed libcall and additional loop move invariants patch v3

Thanks for staying on it!  I've looked through the latest version of your
patch.

There are still formatting issues.  I will not point out one by one at this
time (but will so later in the end).  For now, can you try to run the style
check script on your changes?

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=contrib/check_GNU_style.sh

and also briefly cross check with https://gcc.gnu.org/codingconventions.html


As for the actual contents of the patch...


> bool
> expand_ashiftrt (rtx *operands)
> {
>   int value = INTVAL (operands[2]) & 31;
^^

Missing check for 'const_int' operand.  See other uses of 'CONST_INT_P'.

> if (dump_file)
>fprintf(dump_file, "ashrsi3: Emitting collapsed libcall\n");

This can be omitted.  It will be obvious in the RTL dump after the expand pass
because of the insn name, or not?


'expand_ashiftrt' is called only in the pattern 'define_expand "ashrsi3"', so
we can move all the code into the expander, like it's done in e.g.
'define_expand "lshrsi3"'.

Then the function 'expand_ashiftrt_individual' can be renamed back to
'expand_ashiftrt'.  

Actually, the 'define_expand "ashrsi3"' can just emit the
'ashrsi3_libcall_collapsed' directly.  In other words, change the original
'expand_ashiftrt' function tail:

>  wrk = gen_reg_rtx (Pmode);
>
> /* Load the value into an arg reg and call a helper.  */
>  emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
>  sprintf (func, "__ashiftrt_r4_%d", value);
>  rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
>  emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));
>  emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
>  return true;

to this:

>  emit_insn (gen_ashrsi3_libcall_collapsed (operands[0], operands[1], 
> GEN_INT(value)));
>  return true;

... but of course only if operand[2] is actually a 'const_int'.  If operand[2]
is not a constant, then the whole expander should fail and not emit anything. 
Which is what the original code was doing.  In case of shifts by non-constant
amounts, the middle-end will then expand the generic libcall for it (if I
remember correctly).

So basically, all we're doing here is adding a proxy pattern for the existing
'ashrsi3_n', that has a simpler structure and can be used by other passes like
combine easier.  Sounds plausible, but looking through the other shift
patterns, they would all need that treatment?

I think the 'define_insn "ashrsi3_libcall_collapsed"' and
'define_insn_and_split "ashrsi3_libcall_expand"' can be merged into a single
pattern 'define_insn_and_split "ashrsi3_n_call" and the function
'expand_ashrsi3_libcall' (the tail of the original 'expand_ashiftrt') can be
just put into the splitter code in the new "ashrsi3_n_call".

The splitter condition should include "&& can_create_pseudo_p ()" to make sure
it's ran before register allocation.

I think this will reduce the change set a bit and make the intention of the
patch a bit clearer.

> +/* { dg-final { scan-assembler 
> "_f_loop1_rshift:.*mov\.l\\t(\.L\[0-9\]+),(r\[0-9\]+).*sts.l\\tpr,@-r15.*(\.L\[0-9\]+):.*jsr\\t@\\2.*bf\.s\\t\\3.*\\1:\\n\\t\.long\\t___ashiftrt_r4_6.*_f_loop2_rshift:"
>  { target { ! has_dyn_shift } } } }  */

Can you try to somehow write this in a simpler way?  Maybe omit some of the
register number matches, as they don't matter etc.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-20 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #84 from Alexander Klepikov  
---
I've forgot to say that first I ran all tests with SH specific loop
optimization enabled when condition 'optimize && flag_move_loop_invariants' is
true. And only then I ran all tests with final (at the moment) version of patch
once more.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-20 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #83 from Alexander Klepikov  
---
Created attachment 55367
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55367=edit
Collapsed libcall and additional loop move invariants patch v3

I digged other targets and I found that cfun->machine can point to user-defined
structure. So I implemented per-function data through it. Loop optimizer now
runs when -O2 or higher is specified and only for functions where ashrsi3
libcall was expanded. I also ran all tests I did before a few times.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #82 from Alexander Klepikov  
---
I have read the docs and other targets' code, and the puzzle finally came
together. A struct with additional current function context is a really good
idea! I'll implement it in a couple of days.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #81 from Alexander Klepikov  
---
(In reply to Oleg Endo from comment #78)
> The compiler processes one function at a time, all passes at once.  It
> doesn't mix passes of different functions.  So I think using global variable
> in sh.cc + override 'set_current_function' should get the job done.  When
> the insn split code is executed, just set the global flag in the SH specific
> function context.

I made a proof of concept using  a global variable only. It is defined in sh.cc
and it is defined as extern in sh_loop.cc. It is set when splitting is done and
it is cleared when sh_loop_done is called, i.e. at the end of loop
optimization. Do we really need 'set_current_function'? Because so far it seems
a little excessive.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #80 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #79)
> 
> I mean that if a user run GCC with -O1 and we don't do SH specific loop move
> invariants pass at -O1, a user could look at binary (or at .S file) and find
> that not all invariants are moved out of the loops, because library
> addresses are hoisted after split1 pass. That would confuse a user even if
> RTL dump is generated, and he can decide that it's a bug into loop2 pass. I
> suggest to generate sh_loop dumps if RTL dump is requested and place there a
> message that sh_loop hoisting is not done due to optimization level setting.

I don't think users would get surprised or anything by lack of certain
optimization. There is no official set of guaranteed optimizations performed at
a particular -O level.  It's OK to output a message into the RTL dump of
course.  But everything else seems a bit overthinking the issue.

Actually the hoisting might not be always done.  E.g. when register pressure in
a loop is high, it might be better to not hoist the function address and keep
it inside of the loop to reduce register pressure.  But I'm not sure the loop
optimization passes are smart enough to do that (yet).

> 
> I checked several cases and they are because ld cannot link little endian
> binary. When I run failed command taken from log file, it always fails for
> linking little endian binary. But sometimes logs say that executing of
> little endian binary is successful. I'm at a loss - how is that even
> possible?

Ugh, sounds weird. Sometimes a particular binutils version is also no good. 
Have you tried using an older binutils?  Maybe it's more stable?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #79 from Alexander Klepikov  
---
(In reply to Oleg Endo from comment #78)
> (In reply to Alexander Klepikov from comment #77)
> > > It'd be good if the newly added passes are ran only with -O2 or higher.
> > 
> > This can be confusing to users when they discover that not all invariants
> > are moved out of loops. Then we should inform them about that at least.
> 
> I don't think the compiler reports any optimizations not done to the user at
> lower optimization levels?  It's normal not to do certain optimizations at a
> lower level, that's why we have -O0 -O1 -O2 ... or do you mean something
> else by that?
> 

I mean that if a user run GCC with -O1 and we don't do SH specific loop move
invariants pass at -O1, a user could look at binary (or at .S file) and find
that not all invariants are moved out of the loops, because library addresses
are hoisted after split1 pass. That would confuse a user even if RTL dump is
generated, and he can decide that it's a bug into loop2 pass. I suggest to
generate sh_loop dumps if RTL dump is requested and place there a message that
sh_loop hoisting is not done due to optimization level setting.

But, actually, it's not that important at the moment because I'm aiming to do
the second option - run sh_loop only when it's potentially needed.

> The compiler processes one function at a time, all passes at once.

That's what I missed! Thank you for clarification!

> > I see some strange new exec fails only at testsuite logs. Right now I'm
> > trying to find the cause.
> 
> What's the configure line of your GCC build?

../gcc-13.1.0/configure --target=sh-elf --with-endian=big,little
--disable-bootstrap --enable-languages=c,c++ --disable-nls --with-gnu-as
--with-gnu-ld --prefix=/usr/local/sh-toolchain/ --without-newlib
--without-headers

I checked several cases and they are because ld cannot link little endian
binary. When I run failed command taken from log file, it always fails for
linking little endian binary. But sometimes logs say that executing of little
endian binary is successful. I'm at a loss - how is that even possible?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #78 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #77)
> > It'd be good if the newly added passes are ran only with -O2 or higher.
> 
> This can be confusing to users when they discover that not all invariants
> are moved out of loops. Then we should inform them about that at least.

I don't think the compiler reports any optimizations not done to the user at
lower optimization levels?  It's normal not to do certain optimizations at a
lower level, that's why we have -O0 -O1 -O2 ... or do you mean something else
by that?

> I'm thinking about this for some time. We know that we should potentially
> run additional loop optimization pass when we're splitting libcall. I did
> not find the way to know in what function we are splitting yet.

The compiler processes one function at a time, all passes at once.  It doesn't
mix passes of different functions.  So I think using global variable in sh.cc +
override 'set_current_function' should get the job done.  When the insn split
code is executed, just set the global flag in the SH specific function context.


> I see some strange new exec fails only at testsuite logs. Right now I'm
> trying to find the cause.

What's the configure line of your GCC build?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-17 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #77 from Alexander Klepikov  
---
> It'd be good if the newly added passes are ran only with -O2 or higher.

This can be confusing to users when they discover that not all invariants are
moved out of loops. Then we should inform them about that at least.

>  Or even better, if we can find a way to enable them only when actually 
> needed. 
> E.g. when it's splitting a shift insn that will potentially need the loop
> optimizations again, set a flag in the function.

I'm thinking about this for some time. We know that we should potentially run
additional loop optimization pass when we're splitting libcall. I did not find
the way to know in what function we are splitting yet.

> However, to get better test coverage, it's better first let the additional
> loop passes run all the time to uncover any potential issues.  Later the
> above can be added as a supplementary optimization.

I see some strange new exec fails only at testsuite logs. Right now I'm trying
to find the cause.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-16 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #76 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #75)
> I found that patch incorrectly works when '-O0 -fmove-loop-invariants' flags
> are set. Stock loop optimization passes do not run when '-O0' is specified
> desipte '-fmove-loop-invariants' is set. I'll do the same and recheck again.

It'd be good if the newly added passes are ran only with -O2 or higher.  Or
even better, if we can find a way to enable them only when actually needed. 
E.g. when it's splitting a shift insn that will potentially need the loop
optimizations again, set a flag in the function.

One way to do that could be adding SH specific function context class/struct to
store some per-function data for the compilation process.  Right now we'd have
only one entry, that is whether to run the additional passes again or not. 
This per-function 'SH context' can be stored as a global variable in sh.cc. 
Then override 'set_current_function' in the backend to reset the SH specific
per-function context.

However, to get better test coverage, it's better first let the additional loop
passes run all the time to uncover any potential issues.  Later the above can
be added as a supplementary optimization.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-16 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #75 from Alexander Klepikov  
---
I found that patch incorrectly works when '-O0 -fmove-loop-invariants' flags
are set. Stock loop optimization passes do not run when '-O0' is specified
desipte '-fmove-loop-invariants' is set. I'll do the same and recheck again.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-14 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Alexander Klepikov  changed:

   What|Removed |Added

  Attachment #55284|0   |1
is obsolete||

--- Comment #74 from Alexander Klepikov  
---
Created attachment 55321
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55321=edit
Collapsed libcall and additional loop move invariants patch v2

[Bug target/54089] [SH] Refactor shift patterns

2023-06-14 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #73 from Alexander Klepikov  
---
I had to check everything a few more times because I found a bug in my patch
that caused the long shifts to expand incorrectly. I added a test that checks
correct shifts expansion in this regard.

I also managed to make a test that verifies hoisting. It can be done with
'scan-assembler' command and regexp.

If you're interested, there's more powerful command 'check-function-bodies',
but it looks like it works only when C++ cross compiler is built, so I decided
to leave the regexp.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-08 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #72 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #71)
> 
> > * Do we really need to add that new source file sh_loop.cc with the wrapper
> > classes?  Can't we just instantiate the passes that are needed in-place when
> > they are registered?
> 
> Unfortunately, no.
> [...]

That's too bad.  Thanks for digging.  I haven't checked myself recently, I
thought this infrastructure has been improved a little bit, but seems not so. 
Perhaps SH is the only backend that wants to do this kind of thing.  Let's
leave it at that for now.  Later, after everything else has been cleared out we
can ask other GCC developers' opinion on this.

> Yes, of course! I'll add some 'tst #imm,0' presence tests.

That'd be good.

> Should I add hoist test? I do not yet understand how hoisting check should be
> performed, but I hope to find examples.

Also not sure how to do that at the moment.  Maybe scanning RTL dumps of the
new extra passes?  Or perhaps just a simple example as a start:

void test_func (int* a_ptr, int* b_ptr, unsigned int count)
{
  for (unsigned int i = 0; i < count; ++i)
a_ptr[i] >> 5;


  for (unsigned int i = 0; i < count; ++i)
b_ptr[i] >> 5;
}

... and then just scan the asm output and check against the expected number of
insns.  In this case, mov.l insn to load the function call address.


> Maybe some other tests? If I'm missing something, please tell me.

Right now not that I can think of.


> And as I understand, I should name test file something like
> pr54089-next-free-number.c, right?

Yes, that should be fine.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-08 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #71 from Alexander Klepikov  
---
> There are some formatting nits in your patch, please check again:

Thanks for pointing that out, I'll check again.

> But more interestingly:
> * Do we really need to add that new source file sh_loop.cc with the wrapper
> classes?  Can't we just instantiate the passes that are needed in-place when
> they are registered?

Unfortunately, no.

First of all, classes are defined in loop-init.cc and thus cannot be accessed
directly.

There are wrappers called 'make_pass_rtl_blah_bla_blah' which create class
instances, but. Class .name field is hardcoded, there's no option to set it
when initializing, and pass manager checks if new pass name differs from any
already existing. Here's the code from passes.cc:

if (pass->type == new_pass_info->pass->type
  && pass->name
  && !strcmp (pass->name, new_pass_info->reference_pass_name) //Name
check
  ...

Moreover, pass_rtl_loop_init::execute method calls a function that do the
following check:

gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);

And in our pass (that goes after split1 pass),
current_ir_type()!=IR_RTL_CFGLAYOUT, so we will get ICE. And even if we could
inherit the class, 'execute' method is final.

I'm not so good at C++, so may be there's another way. I know that the less
code is the better, but I'm afraid, not this time.

> 
> * Can you add some tests to the SH specific tests?

Yes, of course! I'll add some 'tst #imm,0' presence tests. Should I add hoist
test? I do not yet understand how hoisting check should be performed, but I
hope to find examples. Maybe some other tests? If I'm missing something, please
tell me.

And as I understand, I should name test file something like
pr54089-next-free-number.c, right?

Thank you.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-08 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #70 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #69)
> Created attachment 55284 [details]
> Collapsed libcall and additional loop move invariants patch

Thanks for sharing the patch.  I also don't have the GCC SH test environment
setup at the moment, so it will take me some time to get up to speed on that.

There are some formatting nits in your patch, please check again:
* don't add / remove empty lines for no reason
* '{' goes on new line (follow surrounding code style)
* in comments: two spaces after the period
* closing ')' and ']' in the RTL should go on the same line (follow surrounding
style)

But more interestingly:
* Do we really need to add that new source file sh_loop.cc with the wrapper
classes?  Can't we just instantiate the passes that are needed in-place when
they are registered?

* Can you add some tests to the SH specific tests?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-08 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #69 from Alexander Klepikov  
---
Created attachment 55284
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55284=edit
Collapsed libcall and additional loop move invariants patch

[Bug target/54089] [SH] Refactor shift patterns

2023-06-08 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #68 from Alexander Klepikov  
---
OK, I finished my patch. Let me remind you briefly, what it does.

First, it does not emit library call for ashrsi3 during expand pass. Instead it
emits intermediate 'collapsed' libcall insn which is expanded later at split1
pass.

Consecutive right arithmetic shifts can be catched at combine pass
and converted to 'collapsed' libcall insn.

Then 'collapsed' insn splits whether to short sequnce of individual right
shifts
or to __ashiftrt_r4_N library call during split1 pass.

Finally, loop move invariants only pass executed right after split1 pass
to hoist library addresses that potentially were emitted during split pass.

I ran GCC testsuite and there were 1 new failed test that actually turned out
to be an optimization, and 3 tests that failed on clean GCC and passed on
patched. I could not set up my environment to test all little endian exec
tests, so for those I don't have results. Also I successfully compiled (but not
linked due to not fully set environment) Linux kenel 6.1. My homemade
performance tests (I timed linux kernel build) showed a slowdown of less then
3%.

To reduce slowdown it would be good to run loop move invariants pass only in
functions that have expanded libcall, but to do so it is necessary to mark such
funcnion when emitting insn and I'm not so sure if insn knows about in what
function it is located.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #67 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #66)
> (In reply to Alexander Klepikov from comment #65)
> > > I'm thinking of something else.
> > 
> > During kernel compile I got few internal errors in different passes. That
> > is, additional loop optimization pass patch is no good at all.
> 
> I am sorry, I am, as always, panicking ahead of time. I think I found what's
> wrong and do additional tests.

Don't worry.  I know what you're going through.  Been there myself ;)
Take your time.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #66 from Alexander Klepikov  
---
(In reply to Alexander Klepikov from comment #65)
> > I'm thinking of something else.
> 
> During kernel compile I got few internal errors in different passes. That
> is, additional loop optimization pass patch is no good at all.

I am sorry, I am, as always, panicking ahead of time. I think I found what's
wrong and do additional tests.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #65 from Alexander Klepikov  
---
> I'm thinking of something else.

During kernel compile I got few internal errors in different passes. That is,
additional loop optimization pass patch is no good at all.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #64 from Alexander Klepikov  
---
> We have to consider that SH is also a linux target and it's also used to
> build larger applications (and GCC itself ...).  It'd be good to not regress
> too much in this regard.  One way to check it is the CSiBE test set.  I can
> help testing your patch later.

I agree with you. Thank you!

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #63 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #62)
> 
> My project is small and it compiles in under 1 second on both clean and
> patched GCC. sh.exp test suite mean run time is 117s on clean and 119s on
> patched. I did 1 warm-up run and then 3 main one-threaded runs for each
> task. I'm thinking of something else.

We have to consider that SH is also a linux target and it's also used to build
larger applications (and GCC itself ...).  It'd be good to not regress too much
in this regard.  One way to check it is the CSiBE test set.  I can help testing
your patch later.

> 
> Implementing features not supported by hardware will always be a tradeoff.

I'd say it's generally about how to find the best choice of
instructions/sequences.  With GCC's "waterfall" optimization it's impossible to
find the best solution because it doesn't have a way to compare the total cost
of one solution vs. another, for example.  We have to work around that ;)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #62 from Alexander Klepikov  
---
> I'm a bit concerned about the increased compile time.  Have you observed
> anything (negative) in this regard?

My project is small and it compiles in under 1 second on both clean and patched
GCC. sh.exp test suite mean run time is 117s on clean and 119s on patched. I
did 1 warm-up run and then 3 main one-threaded runs for each task. I'm thinking
of something else.

> 
> Loop, hoist, constant propagation etc (+ all the good stuff) optimizations
> are done before insn combine / split1.  We could add a simple SH specific
> pass that goes over the RTL and does stuff to shifts before those other
> optimizations.

That's a good idea! 

> However, it might miss insn combine opportunities later on. 

Implementing features not supported by hardware will always be a tradeoff.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #61 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #60)
> > Maybe it's easier to add some shift specific passes.
> 
> Well, I couldn't think of anything better and ported the loop optimization
> pass. More precisely, all loop optimization passes, because they are tied to
> each other. They run after split1 pass. And it worked!
> 
> I want to beleive that another loop optimization pass won't break anything
> because loops are already optimized.

Theoretically it should't ... 

> 
> If that's redundant, I thought of expanding libcall if it's inside a loop
> before loop optimization passes.

I'm a bit concerned about the increased compile time.  Have you observed
anything (negative) in this regard?

Loop, hoist, constant propagation etc (+ all the good stuff) optimizations are
done before insn combine / split1.  We could add a simple SH specific pass that
goes over the RTL and does stuff to shifts before those other optimizations. 
However, it might miss insn combine opportunities later on.  I'm thinking about
your tst #imm,r0 case from before.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-06 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #60 from Alexander Klepikov  
---
> Maybe it's easier to add some shift specific passes.

Well, I couldn't think of anything better and ported the loop optimization
pass. More precisely, all loop optimization passes, because they are tied to
each other. They run after split1 pass. And it worked!

I want to beleive that another loop optimization pass won't break anything
because loops are already optimized.

If that's redundant, I thought of expanding libcall if it's inside a loop
before loop optimization passes.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-04 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #59 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #58)
> 
> Ouch. That's a real problem. Short loops can become slower on about 10%. But
> is it possible to detect a loop during expand pass? It looks like basic
> blocks have loop depth info, but I still don't now how to access a basic
> block.

There is 'BLOCK_FOR_INSN'

But I'm not sure it will be helpful during the initial expand pass.


> I would try that. I think loop optimiztion pass should be repeated after
> split pass. Do you know how to do it?

I don't know if we can simply repeat the loop optimizations.  I think I've
tried doing something like that before -- repeating some of the RTL passes, but
without any useful results.  It could also result in oscillations (pass A does
a transformation, pass B undoes it, then pass A would do it again ...).  Maybe
you can get better results.

There are already 2 SH specific passes 'sh_optimize_sett_clrt' and
'sh_treg_combine'.  You can look at how they are instantiated and inserted into
the RTL passes chain in 'register_sh_passes'.

Maybe it's easier to add some shift specific passes.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-04 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #58 from Alexander Klepikov  
---
> Another thing ... the reason why it's desirable to expand into the libcall
> earlier is to allow hoisting the function call address outside of loops and
> things like that.

Ouch. That's a real problem. Short loops can become slower on about 10%. But is
it possible to detect a loop during expand pass? It looks like basic blocks
have loop depth info, but I still don't now how to access a basic block.

> That happens only once at the RTL level during
> compilation and that's before the combine pass.  Since there are no loops
> around optimization passes in the compiler, it's always going to be a
> compromise.  But it might be OK to repeat a particular optimization pass
> only for SH, if it helps anything.

I would try that. I think loop optimiztion pass should be repeated after split
pass. Do you know how to do it?

[Bug target/54089] [SH] Refactor shift patterns

2023-06-03 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #57 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #56)
> 
> > In that test you can see the unnecessary push/pop of PR.  This is because
> > initially it wanted to expand as a library call, but then your patterns
> > decided to change the insns. This can or can't be avoided, depending on the
> > case.
> 
> That's an valuable observation, thank you! I found why. Sometimes 'collapsed
> libcall' instruction is emitted in combine pass and 'clobber PR' is set
> then. I commented this out and checked that file and it seems to be OK. I
> will rerun testsuite again to be sure.

Another thing ... the reason why it's desirable to expand into the libcall
earlier is to allow hoisting the function call address outside of loops and
things like that.  That happens only once at the RTL level during compilation
and that's before the combine pass.  Since there are no loops around
optimization passes in the compiler, it's always going to be a compromise.  But
it might be OK to repeat a particular optimization pass only for SH, if it
helps anything.

> 
> Now regarding tests that fail on vanilla GCC and pass with patch. It looks
> like something was changed in the middle of GCC and it started to produce
> other instruction patterns that cannot be catched by existing isns and thus
> expanded to library calls. The patch simply fixed it.

Yes, that happens every now and then, as folks work on target independent
optimizations in the middle-end.  SH backend hasn't been keeping up for a while
in this regard.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-03 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #56 from Alexander Klepikov  
---
> > Regarding testsuite. There's execute fails, but this is due to lack of
> > multilib. I'll rebuild and retest.
> > 
> > There's also fail in pr64345-1.c, in this function:
> > [...]
> > 
> > But it looks more like it's not a fail, but an optimization.
> 
> Yeah, that looks like an improvement.  There might be some SH specific tests
> that scan for particular assembler outputs like that one.  Those tests would
> need to be adjusted of course.

I checked and that one was the only one.

> In that test you can see the unnecessary push/pop of PR.  This is because
> initially it wanted to expand as a library call, but then your patterns
> decided to change the insns. This can or can't be avoided, depending on the
> case.

That's an valuable observation, thank you! I found why. Sometimes 'collapsed
libcall' instruction is emitted in combine pass and 'clobber PR' is set then. I
commented this out and checked that file and it seems to be OK. I will rerun
testsuite again to be sure.

Now regarding tests that fail on vanilla GCC and pass with patch. It looks like
something was changed in the middle of GCC and it started to produce other
instruction patterns that cannot be catched by existing isns and thus expanded
to library calls. The patch simply fixed it.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-03 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #55 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #54)
> Regarding testsuite. There's execute fails, but this is due to lack of
> multilib. I'll rebuild and retest.
> 
> There's also fail in pr64345-1.c, in this function:
> [...]
> 
> But it looks more like it's not a fail, but an optimization.

Yeah, that looks like an improvement.  There might be some SH specific tests
that scan for particular assembler outputs like that one.  Those tests would
need to be adjusted of course.

In that test you can see the unnecessary push/pop of PR.  This is because
initially it wanted to expand as a library call, but then your patterns decided
to change the insns. This can or can't be avoided, depending on the case.

> 
> But also there's tests that pass on patched but fail on clean. I'll take a
> closer look on them later after GCC and multilibs rebuild.

Yes, there are some (well ... quite a lot actually) tests that will also fail
on vanilla GCC on SH.  Hence the need to look at the test result delta
before/after patch.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-03 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #54 from Alexander Klepikov  
---
Regarding testsuite. There's execute fails, but this is due to lack of
multilib. I'll rebuild and retest.

There's also fail in pr64345-1.c, in this function:

typedef signed char int8_t;

int test_int8_t__shift_7_8 (int8_t* x) {
 return ((*x >> 7) ^ 1) & 1;
}

Clean GCC:

.file   "pr64345-1-test.c"
.text
.text
.align 1
.align 2
.global _test_int8_t__shift_7_8
.type   _test_int8_t__shift_7_8, @function
_test_int8_t__shift_7_8:
mov.l   .L4,r1
sts.l   pr,@-r15
jsr @r1
mov.b   @r4,r4
mov r4,r0
tst #1,r0
movtr0
lds.l   @r15+,pr
rts
nop
.L5:
.align 2
.L4:
.long   ___ashiftrt_r4_7
.size   _test_int8_t__shift_7_8, .-_test_int8_t__shift_7_8
.ident  "GCC: (GNU) 13.1.0"


Patched GCC:

.file   "pr64345-1-test.c"
.text
.text
.align 1
.align 2
.global _test_int8_t__shift_7_8
.type   _test_int8_t__shift_7_8, @function
_test_int8_t__shift_7_8:
mov.b   @r4,r1
sts.l   pr,@-r15
cmp/pz  r1
movtr0
lds.l   @r15+,pr
rts
nop
.size   _test_int8_t__shift_7_8, .-_test_int8_t__shift_7_8
.ident  "GCC: (GNU) 13.1.0"

But it looks more like it's not a fail, but an optimization.

But also there's tests that pass on patched but fail on clean. I'll take a
closer look on them later after GCC and multilibs rebuild.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #53 from Oleg Endo  ---
(In reply to Segher Boessenkool from comment #52)
> 
> There is TARGET_LEGITIMATE_COMBINED_INSN though, which is a workaround for if
> you really do not want the instruction combiner to create particular
> instruction patterns (but it does nothing to prevent other parts of the
> compiler from doing the same!)

Thanks for pointing it out.  I knew I missed something recent ;)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #52 from Segher Boessenkool  ---
(In reply to Alexander Klepikov from comment #50)
> But maybe there is a way to exclude particular insn from combine pass? (I
> guess not).

In general, it is best to let combine just work on everything.  It will not
replace instructions if the replacement is more expensive, and it will only
ever create instruction sequences with the same semantics as what it started
with.

There is TARGET_LEGITIMATE_COMBINED_INSN though, which is a workaround for if
you really do not want the instruction combiner to create particular
instruction patterns (but it does nothing to prevent other parts of the
compiler from doing the same!)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #51 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #50)
> 
> Ooh, my bad! You are absolutely right. A function is inlined and division is
> converted to 4 'shar's which at combine pass are catched by 'define_insn
> "ashrsi3_libcall_collapsed"' which is little unexpected to me. Of course, it
> is expanded at 'split' pass to lib call. 5 and less right shifts should not
> convert to a lib call, but that can be easily corrected.
> 
> But maybe there is a way to exclude particular insn from combine pass? (I
> guess not).
> 

I don't think there is a direct way to hide patterns from the combine pass,
although I could be wrong, since I haven't been tracking the combine pass
development closely for a while.

Perhaps you can look at how the left-shifts are done as a reference.


> Now concerning GCC testsuite. I ran it in such way:
> 
> make check RUNTESTFLAGS="CFLAGS='$CFLAGS -I/usr/local/sh-elf/include/'
> --target_board=sh-sim\{-m2e,-m4\}\{-ml,-mb\}"
> 
> There are lots of failed tests on non-patched GCC. Even if I narrow tests
> list to sh.exp, it still fails a lot of times. At last there's nothing in
> logs that produce ICE in RTL. I'm not sure I could find a crash due to the
> patch at all, even if it were there.

To test it, you first use a known good version of GCC without your patch, build
the whole toolchain from scratch and run it.  Then collect the test results.

Then apply your patch, and repeat the process.  Collect the test results again.

Then compare both test results.  If there are any new changes in the test
results your patch is hitting something.

Usually I'd run the testsuite as follows:

make -k check
RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

(running make with multiple jobs here is useful, and still it will take some
hours)

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #50 from Alexander Klepikov  
---
> > I've found that my patch catches integer division. In short, it appears to
> > work unpredictable. It looks like there's no easy way to catch right shift.
> 
> What do you mean it catches integer division? There could be several places
> during compilation, where the compiler will try to convert integer division
> to right shifts.  But it will not do so unless it's wrong.  The patterns you
> have added shouldn't match a division operation.

Ooh, my bad! You are absolutely right. A function is inlined and division is
converted to 4 'shar's which at combine pass are catched by 'define_insn
"ashrsi3_libcall_collapsed"' which is little unexpected to me. Of course, it is
expanded at 'split' pass to lib call. 5 and less right shifts should not
convert to a lib call, but that can be easily corrected.

But maybe there is a way to exclude particular insn from combine pass? (I guess
not).

Here are the files that confused me:

$ cat pr49880-3.c
/* Check that the option -mdiv=call-table works.  */
/* { dg-do link }  */
/* { dg-options "-mdiv=call-table" }  */

int
test00 (int a, int b)
{
  return a / b;
}

unsigned int
test01 (unsigned int a, unsigned b)
{
  return a / b;
}

int
main (int argc, char** argv)
{
  return test00 (argc, 123) + test01 (argc, 123);
}

Not-patched GCC

$ cat pr49880-3.s.clean
.file   "pr49880-3.c"
.text
.text
.align 1
.align 2
.global _test00
.type   _test00, @function
_test00:
mov.l   .L4,r0
sts.l   pr,@-r15
jsr @r0
nop
lds.l   @r15+,pr
rts
nop
.L5:
.align 2
.L4:
.long   ___sdivsi3
.size   _test00, .-_test00
.align 1
.align 2
.global _test01
.type   _test01, @function
_test01:
mov.l   .L8,r0
sts.l   pr,@-r15
jsr @r0
nop
lds.l   @r15+,pr
rts
nop
.L9:
.align 2
.L8:
.long   ___udivsi3
.size   _test01, .-_test01
.section.text.startup,"ax",@progbits
.align 1
.align 2
.global _main
.type   _main, @function
_main:
mov.l   .L11,r1
dmuls.l r1,r4
sts mach,r0
sharr0
sharr0
sharr0
sharr0
mov r4,r1
shllr1
subcr1,r1
sub r1,r0
mov.l   .L12,r1
dmulu.l r1,r4
sts mach,r1
sub r1,r4
shlrr4
add r4,r1
shlr2   r1
shlr2   r1
shlr2   r1
rts
add r1,r0
.L13:
.align 2
.L11:
.long   558694933
.L12:
.long   174592167
.size   _main, .-_main
.ident  "GCC: (GNU) 13.1.0"

Patched GCC

$ cat pr49880-3.s
.file   "pr49880-3.c"
.text
.text
.align 1
.align 2
.global _test00
.type   _test00, @function
_test00:
mov.l   .L4,r0
sts.l   pr,@-r15
jsr @r0
nop
lds.l   @r15+,pr
rts
nop
.L5:
.align 2
.L4:
.long   ___sdivsi3
.size   _test00, .-_test00
.align 1
.align 2
.global _test01
.type   _test01, @function
_test01:
mov.l   .L8,r0
sts.l   pr,@-r15
jsr @r0
nop
lds.l   @r15+,pr
rts
nop
.L9:
.align 2
.L8:
.long   ___udivsi3
.size   _test01, .-_test01
.section.text.startup,"ax",@progbits
.align 1
.align 2
.global _main
.type   _main, @function
_main:
mov.l   .L12,r2
mov r4,r1
sts.l   pr,@-r15
dmuls.l r2,r4
mov.l   .L13,r2
jsr @r2
sts mach,r4
mov r1,r2
shllr2
subcr2,r2
mov r4,r0
sub r2,r0
mov.l   .L14,r2
dmulu.l r2,r1
sts mach,r2
sub r2,r1
shlrr1
add r1,r2
shlr2   r2
shlr2   r2
shlr2   r2
add r2,r0
lds.l   @r15+,pr
rts
nop
.L15:
.align 2
.L12:
.long   558694933
.L13:
.long   ___ashiftrt_r4_4
.L14:
.long   174592167
.size   _main, .-_main
.ident  "GCC: (GNU) 13.1.0"



Now concerning GCC testsuite. I ran it in such way:

make check RUNTESTFLAGS="CFLAGS='$CFLAGS -I/usr/local/sh-elf/include/'
--target_board=sh-sim\{-m2e,-m4\}\{-ml,-mb\}"

There are lots of failed tests on non-patched GCC. Even if I narrow tests list
to sh.exp, it still fails a lot of times. At last there's nothing in logs that
produce ICE in RTL. I'm not sure I could find a crash due to the patch at all,
even if it were there.

And finally, 'parallel' appears to be unnecessary, thank you for pointing.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #49 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #48)
> Let's continue discussion we started here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
> 
> I've found that my patch catches integer division. In short, it appears to
> work unpredictable. It looks like there's no easy way to catch right shift.

What do you mean it catches integer division? There could be several places
during compilation, where the compiler will try to convert integer division to
right shifts.  But it will not do so unless it's wrong.  The patterns you have
added shouldn't match a division operation.

[Bug target/54089] [SH] Refactor shift patterns

2023-06-01 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Alexander Klepikov  changed:

   What|Removed |Added

 CC||klepikov.alex+bugs at gmail 
dot co
   ||m

--- Comment #48 from Alexander Klepikov  
---
Let's continue discussion we started here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

I've found that my patch catches integer division. In short, it appears to work
unpredictable. It looks like there's no easy way to catch right shift.

[Bug target/54089] [SH] Refactor shift patterns

2018-11-19 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #47 from Oleg Endo  ---
Thanks for the reminder.  The issue hasn't been fully resolved. Please leave
this PR open.  This will be helpful when getting back to the issue.

[Bug target/54089] [SH] Refactor shift patterns

2018-11-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #46 from Martin Liška  ---
Can the bug be marked as resolved?

[Bug target/54089] [SH] Refactor shift patterns

2016-05-06 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #45 from Oleg Endo  ---
Author: olegendo
Date: Fri May  6 09:41:57 2016
New Revision: 235950

URL: https://gcc.gnu.org/viewcvs?rev=235950=gcc=rev
Log:
gcc/
PR target/54089
* config/sh/sh.md (*rotcr): Add another variant.

gcc/testsuite/
PR target/54089
* gcc.target/sh/pr54089-1.c (test_24): Add new sub-test.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c

[Bug target/54089] [SH] Refactor shift patterns

2016-02-22 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #44 from Oleg Endo  ---
Author: olegendo
Date: Mon Feb 22 13:33:31 2016
New Revision: 233601

URL: https://gcc.gnu.org/viewcvs?rev=233601=gcc=rev
Log:
gcc/
PR target/69806
PR target/54089
* config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p, sh_dynamicalize_shift_p):
Handle negative shift counts.
* config/sh/sh.md (ashlsi3, lshrsi3_n, lshrsi3_n_clobbers_t): Don't use
force_reg on the shift constant.
(lshrsi3): Likewise.  Expand into lshrsi3_n* instead of lshrsi3_d.
(lshrsi3_d): Handle negative shift counts.

gcc/testsuite/
PR target/69806
PR target/54089
* gcc.target/sh/pr54089-10.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr54089-10.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog

[Bug target/54089] [SH] Refactor shift patterns

2015-08-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Segher Boessenkool segher at gcc dot gnu.org changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #41 from Segher Boessenkool segher at gcc dot gnu.org ---
On the other hand, there are quite a few archs (more modern ones
mostly) where the Linux kernel uses the real libgcc.  And the SH
kernel maintainers won't complain anyway (SH is orphaned).


[Bug target/54089] [SH] Refactor shift patterns

2015-08-11 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #40 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Rich Felker from comment #39)
 Oleg, do you have rights under your copyright assignment agreement to
 dual-license the libgcc part of the patch under GPLv2 so it could be
 included in the kernel, and if so, would you be willing to do so?

If I understand it correctly, it should be OK.  Please contact me directly
about this.  If possible, please also add the other people who want this to the
CC list.

 I agree it
 would be cleaner for the kernel not to duplicate libgcc code, but getting in
 a simple patch to update the code that's there would be a lot easier than
 the build-architectural change of using libgcc itself in the kernel, which
 is an issue I'd really rather not fight with the kernel developers on if
 they disagree. :-)

A quick search for libgcc linux kernel tells the story about this. 
Unfortunately, you are right.


[Bug target/54089] [SH] Refactor shift patterns

2015-08-11 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #42 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #41)
 On the other hand, there are quite a few archs (more modern ones
 mostly) where the Linux kernel uses the real libgcc.

If this is the case (could you please name a few of those archs just for
reference?) ...

 And the SH
 kernel maintainers won't complain anyway (SH is orphaned).

... then this issue now is a very good opportunity to fix the actual problem
(remove libgcc code from the kernel and link it against the proper libgcc).


[Bug target/54089] [SH] Refactor shift patterns

2015-08-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #43 from Segher Boessenkool segher at gcc dot gnu.org ---
The archs that already use the real libgcc are arc, cris, hexagon,
m32r, nios2, openrisc, parisc, xtensa.

grep -w LIBGCC ~/src/kernel/arch/*/Makefile


[Bug target/54089] [SH] Refactor shift patterns

2015-08-10 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #39 from Rich Felker bugdal at aerifal dot cx ---
Oleg, do you have rights under your copyright assignment agreement to
dual-license the libgcc part of the patch under GPLv2 so it could be included
in the kernel, and if so, would you be willing to do so? I agree it would be
cleaner for the kernel not to duplicate libgcc code, but getting in a simple
patch to update the code that's there would be a lot easier than the
build-architectural change of using libgcc itself in the kernel, which is an
issue I'd really rather not fight with the kernel developers on if they
disagree. :-)


[Bug target/54089] [SH] Refactor shift patterns

2015-05-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Rich Felker bugdal at aerifal dot cx changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #33 from Rich Felker bugdal at aerifal dot cx ---
The commit in comment 16 broke Linux (the kernel) and nobody seems to have
noticed since 2012...


[Bug target/54089] [SH] Refactor shift patterns

2015-05-30 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #34 from Oleg Endo olegendo at gcc dot gnu.org ---
Thanks for spotting and pin pointing.  Do you have a bit more info, maybe a
reduced test case?


[Bug target/54089] [SH] Refactor shift patterns

2015-05-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #35 from Rich Felker bugdal at aerifal dot cx ---
No, but the cause is simple -- the kernel doesn't use libgcc but defines its
own versions of these functions, and has the old ones but not the new ones. I
tried just removing the kernel's copies and using libgcc.a but it produced a
non-bootable kernel and I didn't test further; my guess would be some
relocations in the libgcc.a versions are incompatible with the way the kernel
is built.


[Bug target/54089] [SH] Refactor shift patterns

2015-05-30 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #36 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Rich Felker from comment #35)
 No, but the cause is simple -- the kernel doesn't use libgcc but defines its
 own versions of these functions, and has the old ones but not the new ones.

Yeah, that's what happens when other SW relies on the implementation details. 
I don't think it's a problem of GCC, but rather the kernel.  You can try to
copy the new functions from the GCC source to the kernel source.


[Bug target/54089] [SH] Refactor shift patterns

2015-05-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #37 from Rich Felker bugdal at aerifal dot cx ---
I expect this is going to be problematic from a license perspective unless they
can be licensed under GPLv2...


[Bug target/54089] [SH] Refactor shift patterns

2015-05-30 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #38 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Rich Felker from comment #33)
 The commit in comment 16 broke Linux (the kernel) and nobody seems to have
 noticed since 2012...

There aren't that many (publicly known) users of Linux on SH2 or SH2A.  If they
don't speak up and stay in the underground, nobody will notice :)


(In reply to Rich Felker from comment #37)
 I expect this is going to be problematic from a license perspective unless
 they can be licensed under GPLv2...

In this case, adjust the functions in the kernel to match the interface of the
compiler.  However, maybe it's better to figure out how to have libgcc in the
kernel to avoid this kind of situation in the future.


[Bug target/54089] [SH] Refactor shift patterns

2014-12-16 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #32 from Oleg Endo olegendo at gcc dot gnu.org ---
Author: olegendo
Date: Tue Dec 16 21:37:42 2014
New Revision: 218795

URL: https://gcc.gnu.org/viewcvs?rev=218795root=gccview=rev
Log:
gcc/testsuite/
PR target/54089
* gcc.target/sh/pr54089-1.c: Change optimization level from -O1 to -O2.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c


[Bug target/54089] [SH] Refactor shift patterns

2014-05-16 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #31 from Oleg Endo olegendo at gcc dot gnu.org ---
Author: olegendo
Date: Fri May 16 23:12:19 2014
New Revision: 210537

URL: http://gcc.gnu.org/viewcvs?rev=210537root=gccview=rev
Log:
gcc/
PR target/54089
* config/sh/predicates.md (negt_reg_shl31_operand): Match additional
patterns.
* config/sh/sh.md (*negt_msb): Merge SH2A and non-SH2A variants.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh.md


[Bug target/54089] [SH] Refactor shift patterns

2013-12-17 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #30 from Oleg Endo olegendo at gcc dot gnu.org ---
A case from libmpeg2/slice.c:

mov.b   @(1,r10),r0// load of shift amount
shldr7,r6
add #1,r6
extu.b  r0,r0  // zero extend shift amount
shldr0,r1  // r1 = r0
mov r1,r0

The zero extension of the shift amount variable could be omitted because shift
amounts  31 are undefined behavior.  If the shift amount is in the valid range
of 0...31 the zero extension won't do anything.
A reduced test case:

int test33 (unsigned char* x, int y)
{
  return y  x[4];
}

results in:
mov.b   @(4,r4),r0
extu.b  r0,r0
shldr0,r5
rts
mov r5,r0


[Bug target/54089] [SH] Refactor shift patterns

2013-02-16 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #29 from Oleg Endo olegendo at gcc dot gnu.org 2013-02-16 
11:36:37 UTC ---

Another case taken from CSiBE / bzip2, where reusing the intermediate shift

result would be better:



void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )

{

   n-b[7] = (UChar)((hi32  24)  0xFF);

   n-b[6] = (UChar)((hi32  16)  0xFF);

   n-b[5] = (UChar)((hi32  8)  0xFF);

   n-b[4] = (UChar) (hi32  0xFF);

/*

   n-b[3] = (UChar)((lo32  24)  0xFF);

   n-b[2] = (UChar)((lo32  16)  0xFF);

   n-b[1] = (UChar)((lo32  8)  0xFF);

   n-b[0] = (UChar) (lo32  0xFF);

*/

}



on rev 196091 with -O2 -m4 compiles to:



mov r6,r0

shlr16  r0

shlr8   r0

mov.b   r0,@(7,r4)

mov r6,r0

shlr16  r0

mov.b   r0,@(6,r4)

mov r6,r0

shlr8   r0

mov.b   r0,@(5,r4)

mov r6,r0

mov.b   r0,@(4,r4)



which would be better as:

mov r6,r0

mov.b   r0,@(4,r4)

shlr8   r0

mov.b   r0,@(5,r4)

shlr8   r0

mov.b   r0,@(6,r4)

shlr8   r0

mov.b   r0,@(7,r4)



this would require reordering of the mem stores, which should be OK to do if

the mem is not volatile.  



Reordering the stores manually:



void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )

{

   n-b[4] = (UChar) (hi32  0xFF);

   n-b[5] = (UChar)((hi32  8)  0xFF);

   n-b[6] = (UChar)((hi32  16)  0xFF);

   n-b[7] = (UChar)((hi32  24)  0xFF);

}



still results in:



mov r6,r0

mov.b   r0,@(4,r4)

mov r6,r0

shlr8   r0

mov.b   r0,@(5,r4)

mov r6,r0

shlr16  r0

mov.b   r0,@(6,r4)

mov r6,r0

shlr16  r0

shlr8   r0

mov.b   r0,@(7,r4)



... at least this case should be handled, I think.


[Bug target/54089] [SH] Refactor shift patterns

2012-11-12 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #28 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-13 
01:13:55 UTC ---

(In reply to comment #27)



 FWIW, while the ARC600/700 have a full set up static and dynamic shifts,

 the ARC 601 with the swap option has similar issues with shifts that should

 be stitched together from multiple instructions, and preferences of unsigned

 over signed shifts.



Ah, good to know that SH is not the only case where it would make sense.

I've created another PR for this issue:  PR 55306


[Bug target/54089] [SH] Refactor shift patterns

2012-11-09 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #26 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-09 
10:48:17 UTC ---

(In reply to comment #25)

 Maybe the better solution would indeed be to add a

 arith - logical shift conversion pass before combine, or try to convert arith

 shifts in the split pass after combine by looking at the data flow of the

 shifted values.



Another idea would be to extend the combine pass, so that whenever it

internally converts arith - logical shifts, it always checks the costs of the

shift op, and if the logical shift is cheaper, always split out the logical

shift.


[Bug target/54089] [SH] Refactor shift patterns

2012-11-09 Thread amylaar at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #27 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org 
2012-11-09 13:29:10 UTC ---

(In reply to comment #26)

 (In reply to comment #25)

  Maybe the better solution would indeed be to add a

  arith - logical shift conversion pass before combine, or try to convert 
  arith

  shifts in the split pass after combine by looking at the data flow of the

  shifted values.



FWIW, while the ARC600/700 have a full set up static and dynamic shifts,

the ARC 601 with the swap option has similar issues with shifts that should

be stitched together from multiple instructions, and preferences of unsigned

over signed shifts.



 Another idea would be to extend the combine pass, so that whenever it

 internally converts arith - logical shifts, it always checks the costs of the

 shift op, and if the logical shift is cheaper, always split out the logical

 shift.



Indeed, combine's insistence that a computation has just one canonical

form, which depends on how much it known about the input/output data,

is in want of some correction.  We got define_split for things that

should become multiple instructions, but nothing for things that should

just be differently-shaped instructions, and there are no predicates available

to get to know about the (non)zero / signbits that combine knows about.



E.g. the following peephole in arc.md is only necessary because combine

does some unwanted 'simplifications' with known-zero bits:



;; -

;; Pattern 1 : r0 = r1  {i}

;; r3 = r4/INT + r0 ;;and commutative

;; ||

;; \/

;; add{i} r3,r4/INT,r1

;; -

;; ??? This should be covered by combine, alas, at times combine gets

;; too clever for it's own good: when the shifted input is known to be

;; either 0 or 1, the operation will be made into an if-then-else, and

;; thus fail to match the add_n pattern.  Example: _mktm_r, line 85 in

;; newlib/libc/time/mktm_r.c .



(define_peephole2

  [(set (match_operand:SI 0 dest_reg_operand )

(ashift:SI (match_operand:SI 1 register_operand )

   (match_operand:SI 2 const_int_operand )))

  (set (match_operand:SI 3 dest_reg_operand )

   (plus:SI (match_operand:SI 4 nonmemory_operand )

(match_operand:SI 5 nonmemory_operand )))]

  (INTVAL (operands[2]) == 1

|| INTVAL (operands[2]) == 2

|| INTVAL (operands[2]) == 3)

(true_regnum (operands[4]) == true_regnum (operands[0])

   || true_regnum (operands[5]) == true_regnum (operands[0]))

(peep2_reg_dead_p (2, operands[0]) || (true_regnum (operands[3]) ==

true_regnum (operands[0])))

 ;; the preparation statements take care to put proper operand in operands[4]

 ;; operands[4] will always contain the correct operand. This is added to

satisfy commutativity

  [(set (match_dup 3)

(plus:SI (mult:SI (match_dup 1)

  (match_dup 2))

 (match_dup 4)))]

  if (true_regnum (operands[4]) == true_regnum (operands[0]))

  operands[4] = operands[5];

   operands[2] = GEN_INT (1  INTVAL (operands[2]));

)


[Bug target/54089] [SH] Refactor shift patterns

2012-11-07 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #25 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-07 
23:31:20 UTC ---

Created attachment 28633

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28633

Arithmetic right shift rework 2



This could be an alternative approach for the arith right shifts.

Applied to rev 193240:



CSiBE -m4-single -O2 -ml -mpretend-cmove

sum:  3159747 - 3160047+300 / +0.009494 %



CSiBE -m2 -O2 -ml

sum:  3314907 - 3313319-1588 / -0.047905 %



The idea here is to let arith shifts initially go through a slightly more

complex pattern 'ashrsi3_bridge_const', which allows the lib function address

or shad constant to be CSE'ed early on.

As mentioned in comment #18, the arith - logical shift conversion is done by

combine only under certain circumstances.  After some trial and error I arrived

at the 'ashrsi3_bridge_const' pattern, which makes combine convert most of the

arith shifts into logical shifts.  In some cases it also finds more

opportunities, but regressions like this one still remain:



void MC_put_xy_16_c

(uint8_t * dest, const uint8_t * ref, const int stride, int height)

{

  // must see a shlr instead of 2x shar.

  // somehow the loop prevents this...

  do

  {

dest[0] = (((ref[0]+ref[0 +1]+(ref+stride)[0]+(ref+stride)[0 +1]+2)2));

ref += stride; dest += stride;

  } while (--height);

}



In addition to that, some unlucky register allocations appear, which cause

unneeded reg-reg copies (it seems something has trouble utilizing commutative

insns such as 'add').



The other problem is that patterns such as:



(define_insn_and_split *lshrsi3

  [(set (match_operand:SI 0 arith_reg_dest)

(lshiftrt:SI (match_operand:SI 1 arith_reg_operand)

 (match_operand:SI 2 shift_count_operand)))

   (clobber (reg:SI T_REG))

   (use (match_operand:SI 3 general_operand))

   (use (match_operand:SI 4 general_operand))]

  TARGET_SH1  can_create_pseudo_p ()  CONST_INT_P (operands[2])

  #

   1

  [(const_int 0)]

{

  emit_insn (gen_lshrsi3 (operands[0], operands[1], operands[2]));

  DONE;

})



have to be added for combine to convert more arith - logical shifts.  In this

patch I've added only one such pattern.  However, if for example zero_extract

patterns are added, variations such as above would be required, too.  This

looks discomforting to me.  Maybe the better solution would indeed be to add a

arith - logical shift conversion pass before combine, or try to convert arith

shifts in the split pass after combine by looking at the data flow of the

shifted values.


[Bug target/54089] [SH] Refactor shift patterns

2012-11-06 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #24 from Oleg Endo olegendo at gcc dot gnu.org 2012-11-06 
11:55:47 UTC ---

Author: olegendo

Date: Tue Nov  6 11:55:43 2012

New Revision: 193236



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=193236

Log:

PR target/54089

* config/sh/sh.c (and_xor_ior_costs, addsubcosts): Double the costs for

ops larger than SImode.

* config/sh/sh.md (rotcl, *rotcl): New insns and splits.

(ashldi3_k): Convert to insn_and_split and use new rotcl insn.



PR target/54089

* gcc.target/sh/pr54089-8.c: New.

* gcc.target/sh/pr54089-9.c: New.





Added:

trunk/gcc/testsuite/gcc.target/sh/pr54089-8.c

trunk/gcc/testsuite/gcc.target/sh/pr54089-9.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/sh.c

trunk/gcc/config/sh/sh.md

trunk/gcc/testsuite/ChangeLog


[Bug target/54089] [SH] Refactor shift patterns

2012-10-16 Thread amylaar at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org changed:



   What|Removed |Added



 CC||amylaar at gcc dot gnu.org



--- Comment #22 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org 
2012-10-16 10:52:46 UTC ---

(In reply to comment #0)

 The code related to shift patterns in sh.c / sh.md maybe could use some

 improvements here and there.  In some places clobbers and scratch regs could 
 be

 avoided.

 There is also a large part that deals with minimizing and-shift/shift-and

 sequences, but there are no test cases to verify that those actually work.

 It would also be interesting to see, whether some of the and-shift/shift-and

 combine code could be reduced due to improvements in the middle-end.



Be careful with removing 'useless' clobbers, as they might be needed to

facilitate instruction combinations into patterns that have these clobbers.


[Bug target/54089] [SH] Refactor shift patterns

2012-10-16 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #23 from Oleg Endo olegendo at gcc dot gnu.org 2012-10-16 
11:49:14 UTC ---

(In reply to comment #22)

 (In reply to comment #0)

  The code related to shift patterns in sh.c / sh.md maybe could use some

  improvements here and there.  In some places clobbers and scratch regs 
  could be

  avoided.

  There is also a large part that deals with minimizing and-shift/shift-and

  sequences, but there are no test cases to verify that those actually work.

  It would also be interesting to see, whether some of the and-shift/shift-and

  combine code could be reduced due to improvements in the middle-end.

 

 Be careful with removing 'useless' clobbers, as they might be needed to

 facilitate instruction combinations into patterns that have these clobbers.



Yeah, I've noticed ;)

Logical left/right shifts are now expanded into T-clobbering and

non-T-clobbering pattern variations.


[Bug target/54089] [SH] Refactor shift patterns

2012-09-30 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #21 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-30 
18:45:56 UTC ---

I've noticed that there seems to be a problem with register allocation related

to shift insns.  For example in the CSiBE set, I've seen sequences such as



mov.w   .L342,r2

mov #0,r7

.align 2

.L340:

mov r7,r1

add r2,r1

sharr1

mov r1,r3  

shll2   r3 

mov r3,r0  

mov.l   @(r0,r5),r3

cmp/gt  r4,r3

bf  0f



quite often.  Not sure why this happens.  Maybe because 'gen_shifty_op' (which

emits the shift sequence insns) is called after reload and not during the first

split pass after combine...


[Bug target/54089] [SH] Refactor shift patterns

2012-09-25 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #20 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-25 
19:06:34 UTC ---

Author: olegendo

Date: Tue Sep 25 19:06:28 2012

New Revision: 191743



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191743

Log:

PR target/54089

* config/sh/constraints.md (Jhb): New constraint.

* config/sh/predicates.md (negt_reg_shl31_operand): New predicate.

* config/sh/sh.md (rotrsi3): New expander.

(rotrsi3_1, *rotrsi3_1, *rotlsi3_1): New insns.

(rotlsi3, rotlhi3): Use const_int_operand predicate instead of

immediate_operand and remove CONST_INT_P checks in expansion code.

(*rotcr): Cleanup variable usage.  Handle preceding nott insn.  Add

split with swapped operands.

(*rotcr_neg_t, *movt_msb, *negt_msb): New insns and splits.



PR target/54089

* gcc.target/sh/pr54089-1.c (test_15, test_16, test_17, test_18,

test_19, test_20, test_21, test_22, test_23): New functions.

* gcc.target/sh/pr54089-4.c: New.

* gcc.target/sh/pr54089-5.c: New.

* gcc.target/sh/pr54089-6.c: New.

* gcc.target/sh/pr54089-7.c: New.





Added:

trunk/gcc/testsuite/gcc.target/sh/pr54089-4.c

trunk/gcc/testsuite/gcc.target/sh/pr54089-5.c

trunk/gcc/testsuite/gcc.target/sh/pr54089-6.c

trunk/gcc/testsuite/gcc.target/sh/pr54089-7.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/constraints.md

trunk/gcc/config/sh/predicates.md

trunk/gcc/config/sh/sh.md

trunk/gcc/testsuite/ChangeLog

trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c


[Bug target/54089] [SH] Refactor shift patterns

2012-09-19 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #19 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-19 
17:48:31 UTC ---

Author: olegendo

Date: Wed Sep 19 17:48:25 2012

New Revision: 191490



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191490

Log:

PR target/54089

* config/sh/predicates.md (arith_reg_or_t_reg_operand): New predicate.

* config/sh/sh.md (*rotcr): Use arith_reg_or_t_reg_operand predicate.

Handle the case where one of the operands is T_REG.

Add new pattern to handle MSB extraction.



PR target/54089

* gcc.target/sh/pr54089-1.c (test_11, test_12, test_13, test_14): New

functions.





Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/predicates.md

trunk/gcc/config/sh/sh.md

trunk/gcc/testsuite/ChangeLog

trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c


[Bug target/54089] [SH] Refactor shift patterns

2012-09-17 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #18 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-17 
23:29:52 UTC ---
Created attachment 28207
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28207
Arithmetic right shift rework

I have tried to apply the same strategies to the arithmetic right shift
patterns as for logical right/left shift patterns.
What happens then is that combine will see arithmetic right shift patterns (for
shift amounts  1) and will fail to convert arithmetic right shifts to logical
right shifts where it would actually be possible.  This results in bigger code
since arithmetic shifts on SH are more expensive than logical shifts.

Unfortunately, combine will only do the shift type conversion if it can combine
the logical shift with some other insn -- which is not very likely on SH. 
Thus, the logical shift is discarded.  If however, no arithmetic shift pattern
is there that matches the shift amount (which is currently the case), combine
will split out the logical shift.

Not having arithmetic right shift patterns makes it difficult to write combine
patterns for things like ... 

char test (int x)
{
  return x  24;
}

which ends up as:
-m4:
mov#-24,r1
shadr1,r4
rts
exts.br4,r0

-m2:
mov.l.L3,r1
sts.lpr,@-r15
jsr@r1
nop
mov
lds.l@r15+,pr
rts
nop
.L4:
.align 2
.L3:
.long___ashiftrt_r4_24


On the other hand, the following is handled almost as expected:

unsigned char test (int a)
{
  return a  24;
}

-m4:
movr4,r0
shlr16r0
rts
shlr8r0

-m2:
mov.l.L3,r1
sts.lpr,@-r15
jsr@r1
nop
extu.br4,r0
lds.l@r15+,p
rts
nop
.L4:
.align 2
.L3:
.long___ashiftrt_r4_24


According to my understanding, to fix those issues something like the attached
patch would be required.  However, it would also require a separate arithmetic
right shift to logical right shift conversion pass.


[Bug target/54089] [SH] Refactor shift patterns

2012-09-10 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #16 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-10 
20:35:30 UTC ---
Author: olegendo
Date: Mon Sep 10 20:35:25 2012
New Revision: 191161

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191161
Log:
PR target/54089
* config/sh/sh.h (SH_DYNAMIC_SHIFT_COST): Set always to 1 if
dynamic shifts are available.
(SHIFT_COUNT_TRUNCATED): Always define to 0.  Correct comment.
* config/sh/sh.c (ashl_lshr_seq, ext_ashl_lshr_seq): Add comments.
* config/sh/predicates.md (shift_count_operand): Allow
arith_reg_operand even if TARGET_DYNSHIFT is false.
* config/sh/sh.md (ashlsi3, lshrsi3): Expand library call patterns
if needed.
(ashlsi3_d_call, lshrsi3_d_call): New insns.

PR target/54089
* config/sh/lib1funcs.S (ashlsi3): Reimplement as ashlsi3_r0.
(lshrsi3): Reimplement as lshrsi3_r0.

PR target/54089
* gcc.target/sh/pr54089-3.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr54089-3.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.h
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
trunk/libgcc/ChangeLog
trunk/libgcc/config/sh/lib1funcs.S


[Bug target/54089] [SH] Refactor shift patterns

2012-09-10 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #17 from Oleg Endo olegendo at gcc dot gnu.org 2012-09-10 
21:27:30 UTC ---
Created attachment 28163
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28163
Alternative dropped software dynamic ashlsi3,lshrsi3 patch

(In reply to comment #16)
 Author: olegendo
 Date: Mon Sep 10 20:35:25 2012
 New Revision: 191161
 
 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=191161
 Log:
 PR target/54089
 * config/sh/sh.h (SH_DYNAMIC_SHIFT_COST): Set always to 1 if
 dynamic shifts are available.
 (SHIFT_COUNT_TRUNCATED): Always define to 0.  Correct comment.
 * config/sh/sh.c (ashl_lshr_seq, ext_ashl_lshr_seq): Add comments.
 * config/sh/predicates.md (shift_count_operand): Allow
 arith_reg_operand even if TARGET_DYNSHIFT is false.
 * config/sh/sh.md (ashlsi3, lshrsi3): Expand library call patterns
 if needed.
 (ashlsi3_d_call, lshrsi3_d_call): New insns.
 
 PR target/54089
 * config/sh/lib1funcs.S (ashlsi3): Reimplement as ashlsi3_r0.
 (lshrsi3): Reimplement as lshrsi3_r0.
 
 PR target/54089
 * gcc.target/sh/pr54089-3.c: New.
 
 
 Added:
 trunk/gcc/testsuite/gcc.target/sh/pr54089-3.c
 Modified:
 trunk/gcc/ChangeLog
 trunk/gcc/config/sh/predicates.md
 trunk/gcc/config/sh/sh.c
 trunk/gcc/config/sh/sh.h
 trunk/gcc/config/sh/sh.md
 trunk/gcc/testsuite/ChangeLog
 trunk/libgcc/ChangeLog
 trunk/libgcc/config/sh/lib1funcs.S

This patch is just for the record/reference.  It is a little bit more complex
than the committed patch.  The main difference is the clobber list and the
input/output regs of the shift insns.  The committed version seemed more
beneficial.


[Bug target/54089] [SH] Refactor shift patterns

2012-08-22 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #15 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-22 
22:52:25 UTC ---
Author: olegendo
Date: Wed Aug 22 22:52:17 2012
New Revision: 190603

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190603
Log:
PR target/54089
* config/sh/predicates (p27_rshift_count_operand,
not_p27_rshift_count_operand): New predicates.
* config/sh/sh.c (sh_ashlsi_clobbers_t_reg_p,
sh_lshrsi_clobbers_t_reg_p, sh_dynamicalize_shift_p): Handle special
case when shift amount is 31.
(gen_ashift): Emit gen_shlr instead of gen_lshrsi3_m.
* config/sh/sh.md (ashlsi3_d): Set type to 'dyn_shift' instead
of 'arith'.
(ashlsi_c): Rename to shll.  Adapt calls to gen_ashlsi_c throughout
the file.
(lshrsi3): Remove clobber from expander.  Use shift_count_operand
instead of nonmemory_operand predicate for second operand.  Add
handling of case lshrsi3_n_clobbers_t.
(lshrsi3_k): Use p27_rshift_count_operand for second operand.
(lshrsi3_d): Make insn_and_split.  Split dynamic shift to constant
shift sequences if beneficial.
(lshrsi3_n): Make insn_and_split.  Split constant shift sequence to
dynamic shift if beneficial.
(lshrsi3_n_clobbers_t): New insn_and_split.
(lshrsi3_m): Delete.

PR target/54089
* gcc.target/sh/pr54089-2.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr54089-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog


[Bug target/54089] [SH] Refactor shift patterns

2012-08-20 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #14 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-20 
21:29:26 UTC ---
For the record, I've committed the following under the wrong PR number by
accident.

Author: olegendo
Date: Mon Aug 20 20:54:20 2012
New Revision: 190545

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190545
Log:
PR target/50489
* config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and splits.
(ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split.
* config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function.
* config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare it.

PR target/50489
* gcc.target/sh/pr54089-1.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog


[Bug target/54089] [SH] Refactor shift patterns

2012-08-16 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #13 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-16 
23:13:18 UTC ---
Author: olegendo
Date: Thu Aug 16 23:13:11 2012
New Revision: 190457

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190457
Log:
PR target/54089
* config/sh/sh.md (ashlsi3_d): Do not split if it would result
in a T_REG clobber.  Correct comment.
(ashlsi3_n): Correct comment.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md


[Bug target/54089] [SH] Refactor shift patterns

2012-08-11 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #12 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-11 
20:25:45 UTC ---
(In reply to comment #9)
 (In reply to comment #8)
  #define SH_DYNAMIC_SHIFT_COST (TARGET_DYNSHIFT ? 1 : 20)
 
 Sounds reasonable.  Perhaps some historical reason for the original
 one, though I don't know why.  Could you run SCiBE tests with it?

I've checked result-size for -m2a-single -mb -O2 ...
In total there's a decrease of 1728 bytes, with a few cases where there are
increases.  I've briefly checked out why code gets bigger in those cases.  As
far as I can see at the moment, one reason is because right shifts are
dynamicalized (converted to dynamic shift insns) although it's not really
beneficial.

I think I'll try to brush up the right shifts patterns first, then try again.


[Bug target/54089] [SH] Refactor shift patterns

2012-08-10 Thread rmansfield at qnx dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #10 from Ryan Mansfield rmansfield at qnx dot com 2012-08-10 
14:24:55 UTC ---
Created attachment 27985
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27985
preprocessed src

 ./xgcc -B. -w ~/ice2.i -c -Os 
/home/ryan/ice2.i: In function 'tg_extent':
/home/ryan/ice2.i:81:2: error: unrecognizable insn:
  }
  ^
(insn 189 163 190 9 (set (reg:SI 227)
(ashift:SI (reg:SI 147 t)
(const_int 1 [0x1]))) /home/ryan/ice2.i:75 -1
 (nil))
/home/ryan/ice2.i:81:2: internal compiler error: in extract_insn, at
recog.c:2125
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


[Bug target/54089] [SH] Refactor shift patterns

2012-08-10 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #11 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-10 
15:40:07 UTC ---
(In reply to comment #10)
 Created attachment 27985 [details]
 preprocessed src
 
  ./xgcc -B. -w ~/ice2.i -c -Os 
 /home/ryan/ice2.i: In function 'tg_extent':
 /home/ryan/ice2.i:81:2: error: unrecognizable insn:
   }
   ^
 (insn 189 163 190 9 (set (reg:SI 227)
 (ashift:SI (reg:SI 147 t)
 (const_int 1 [0x1]))) /home/ryan/ice2.i:75 -1
  (nil))
 /home/ryan/ice2.i:81:2: internal compiler error: in extract_insn, at
 recog.c:2125
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See http://gcc.gnu.org/bugs.html for instructions.

Thanks for the test case!  It turns out that this is basically the same ICE as
triggered by your test case in PR 39423.  I will include this test case in the
next patch for PR 39423.


[Bug target/54089] [SH] Refactor shift patterns

2012-08-09 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

Oleg Endo olegendo at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kkojima at gcc dot gnu.org

--- Comment #4 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 21:54:42 
UTC ---
I'm currently playing around with the macro SHIFT_COUNT_TRUNCATED (sh.h) and
the target hook TARGET_SHIFT_TRUNCATION_MASK (which is not implemented yet).
Doing the following on rev 190259 (which is actually wrong):

sh.h:
-#define SHIFT_COUNT_TRUNCATED (! TARGET_SH3  ! TARGET_SH2A)
+#define SHIFT_COUNT_TRUNCATED (1)

sh.c:
+/* Implement the TARGET_SHIFT_TRUNCATION_MASK target hook.  */
+
+#undef TARGET_SHIFT_TRUNCATION_MASK
+#define TARGET_SHIFT_TRUNCATION_MASK sh_shift_truncation_mask
+
+static unsigned HOST_WIDE_INT
+sh_shift_truncation_mask (enum machine_mode mode)
+{
+  int bitsize = GET_MODE_BIT_SIZE (mode);
+
+  if (TARGET_SHMEDIA)
+return bitsize - 1;
+
+  return MAX (32, bitsize) - 1;
+}

... and looking at some CSiBE size results, I see the a couple of weird cases
similar to what happens to the set_bh_page function in
linux-2.4.23-pre3-testplatform/fs/buffer.c.

Without the changes:
   ...
.L592:
mov.l   .L598,r1 !! 
mov r9,r3
mov.l   @r1,r2   !! r2 = zone_table[0]
add #124,r2
mov.l   @(36,r2),r1
mov.l   @(32,r2),r2
add r10,r1
mov.l   r9,@(56,r8)
sub r2,r3
mov r3,r2
mov.l   .L599,r3
sharr2
sharr2
mul.l   r3,r2
mov #12,r3
sts macl,r2
shldr3,r2
add r2,r1
mov.l   r1,@(52,r8)

With the changes:
...
.L592:
mov.l   @(24,r8),r0   !! 
mov r8,r3
mov.l   .L598,r1
shlr16  r0!!
shlr8   r0!!
shll2   r0!!
mov.l   @(r0,r1),r2   !! r2 = zone_table[page-flags  ZONE_SHIFT]
add #124,r2
mov.l   @(36,r2),r1
mov.l   @(32,r2),r2
add r10,r1
mov.l   r8,@(56,r9)
sub r2,r3
mov r3,r2
mov.l   .L599,r3
sharr2
sharr2
mul.l   r3,r2
mov #12,r3
sts macl,r2
shldr3,r2
add r2,r1
mov.l   r1,@(52,r9)

It seems that without the (wrong) patch, the index in the inline function
'page_zone' is reduced from 'page-flags  ZONE_SHIFT' to '0', and thus the
resulting code is wrong?!
I've tried to reproduce this in an isolated test case but couldn't get it to do
the same - the generated code seems always correct, with and without the
changes.

I'm confused... Kaz, do you have any idea what could be going on there?


[Bug target/54089] [SH] Refactor shift patterns

2012-08-09 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #5 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 23:17:54 
UTC ---
OK, I checking out the preprocessed file reveals the following relevant pieces:

typedef struct page {
 struct list_head list;
 struct address_space *mapping;
 unsigned long index;
 struct page *next_hash;

 atomic_t count;
 unsigned long flags;//  

 struct list_head lru;

 struct page **pprev_hash;
 struct buffer_head * buffers;
} mem_map_t;


static inline zone_t *page_zone(struct page *page)
{
 return zone_table[page-flags  (64 - 8)];  // = page-flags  56
}

Depending on whether SHIFT_COUNT_TRUNCATED evals to 1 or 0, the function above
returns either zone_table[0] (dyn shifts) or zone_table[page-flags  24] (no
dyn shifts).  Both should be OK to do, since that's undefined behavior.
In this case maybe fixing SHIFT_COUNT_TRUNCATED to '0' would be better, since
that would produce faster undefined behavior code ;)


[Bug target/54089] [SH] Refactor shift patterns

2012-08-09 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #6 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 23:27:55 
UTC ---
Author: olegendo
Date: Thu Aug  9 23:27:51 2012
New Revision: 190273

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190273
Log:
PR target/54089
* config/sh/sh-protos (shift_insns_rtx): Delete.
(sh_ashlsi_clobbers_t_reg_p): Add.
* config/sh/sh.c (shift_insns, shift_amounts, ext_shift_insns,
ext_shift_amounts): Merge arrays of ints to array of structs.
Adapt usage of arrays throughout the file.
(shift_insns_rtx): Delete unused function.
(sh_ashlsi_clobbers_t_reg_p): New function.
* config/sh/sh.md (ashlsi3): Emit ashlsi3_n_clobbers_t insn if the
final shift sequence will clobber T_REG.
(ashlsi3_n): Split only if the final shift sequence will not
clobber T_REG.
(ashlsi3_n_clobbers_t): New insn_and_split.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md


[Bug target/54089] [SH] Refactor shift patterns

2012-08-09 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089

--- Comment #7 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-09 23:36:10 
UTC ---
Kaz, another thing I'm a bit unsure about ...

#define SH_DYNAMIC_SHIFT_COST \
  (TARGET_HARD_SH4 ? 1 : TARGET_DYNSHIFT ? (optimize_size ? 1 : 2) : 20)

Do you have any idea, why this is not simply
#define SH_DYNAMIC_SHIFT_COST (TARGET_DYNSHIFT) ?

I've checked the HW manuals for SH2A, SH3 and SH4 and all state that dynamic
shifts take one cycle, i.e. are as fast/slow as the other constant shift
insns...


  1   2   >