[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-17 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #39 from Levy  ---
Checked all pass from 250r.shorten_memrefs to 270r.ce2

In 269r.combine I saw the following combination merged the replaced address:
---
modifying insn i327: r92:DI=r96:DI+0x300
  REG_DEAD r96:DI
deferring rescan insn with uid = 27.
(!)allowing combination of insns 27 and 6
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i227: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i3 6: r82:DI=sign_extend([r96:DI+0x320])
  REG_DEAD r96:DI
deferring rescan insn with uid = 6.
(!)allowing combination of insns 27 and 8
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i227: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i3 8: r84:DI=sign_extend([r96:DI+0x324])
  REG_DEAD r96:DI
deferring rescan insn with uid = 8.
(!)allowing combination of insns 27 and 12
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i227: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i312: r87:DI=sign_extend([r96:DI+0x328])
  REG_DEAD r96:DI
deferring rescan insn with uid = 12.
(!)allowing combination of insns 27 and 16
original costs 4 + 16 = 20
replacement cost 16
deferring deletion of insn with uid = 27.
modifying insn i316: r90:DI=sign_extend([r96:DI+0x32c])
  REG_DEAD r96:DI
deferring rescan insn with uid = 16.
allowing combination of insns 18 and 19
---
Where in 268r.ud_dce both insns 27 are same (except for memory address):

(insn 27 26 28 2 (set (reg:DI 10 a0)
(lo_sum:DI (reg/f:DI 85)
(symbol_ref/f:DI ("*.LC0") [flags 0x82]  ))) "array_test.c":21:5 133 {*lowdi}
 (expr_list:REG_DEAD (reg/f:DI 85)
(expr_list:REG_EQUAL (symbol_ref/f:DI ("*.LC0") [flags 0x82]  )
(nil
---
In 270r.combine (patched):

(note 27 3 6 2 NOTE_INSN_DELETED)

and following insns 768 + 32/36/40/44 were put back as:

(insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}


(insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}


(insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}

(insn 16 14 18 2 (set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
 (expr_list:REG_DEAD (reg:DI 96)
(nil)))

Maybe combine.c needs some modification?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-17 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #38 from Levy  ---
Created attachment 49575
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49575=edit
riscv-shorten-memrefs_V1.patch

Did little bit change in get_si_mem_base_reg() and transform()
Now for the same c input array_test.c

int load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

int main ()
{
int arr[300]= {0};
arr[200] = 15;
arr[201] = -33;
arr[202] = 7;
arr[203] = -999;
int b = load1r(arr);
printf("Result: %d\n",b);
return 0;
}

in loadlr, when put a debug_rtx(pat) after:

(unpatched)XEXP (pat, i) = replace_equiv_address (mem, addr); 
or 
(patched)XEXP (XEXP (pat, I), 0) = replace_equiv_address(XEXP (mem, 0), addr);



unpatched gcc will produce follwing insns:
-
(set (reg:SI 81 [ MEM[(int *)array_5(D) + 800B] ])
(mem:SI (plus:DI (reg:DI 88)
(const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4 A32]))
(set (reg:SI 82 [ MEM[(int *)array_5(D) + 804B] ])
(mem:SI (plus:DI (reg:DI 89)
(const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4 A32]))
(set (reg:SI 84 [ MEM[(int *)array_5(D) + 808B] ])
(mem:SI (plus:DI (reg:DI 90)
(const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4 A32]))
(set (reg:SI 86 [ MEM[(int *)array_5(D) + 812B] ])
(mem:SI (plus:DI (reg:DI 91)
(const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4 A32]))
-


patched gcc will produce follwing insns:
-
(set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 92)
(const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])))
(set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 93)
(const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])))
(set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 94)
(const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])))
(set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 95)
(const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4
A32])))
-
which the patched one looks ok for me, but the final assembly code still shows
no change (both under -Os)

Not quite sure where the problem is, I'll have a look near
array_test.c.250r.shorten_memrefs tomorrow.

Please ignore the coding style as it's just a debug patch

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-15 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #37 from Kito Cheng  ---
Maybe we could add a parameter to indicate the type of memory access,
plain_mem, zext_mem or sext_mem for pass_shorten_memrefs::get_si_mem_base_reg.

e.g.
  for (int i = 0; i < 2; i++)
{   
  rtx mem = XEXP (pat, i); 
  rtx addr;
  mem_access_type_t type;
  if (get_si_mem_base_reg (mem, , ))
{   
  HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
  /* Do not transform store zero as these cannot be compressed.  */
  if (i == 0)
{   
  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1
continue;
}   
  if (m->get_or_insert (regno) > 3)
{
  addr
= targetm.legitimize_address (addr, addr, GET_MODE (mem));
  switch (type)
{
case plain_mem:
  XEXP (pat, i) = replace_equiv_address (mem, addr);
  break;
case zext_mem:
  ...
  break;
case sext_mem:
  ...
  break;
}
  df_insn_rescan (insn);
}   
}

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-15 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #36 from Levy  ---
It seems get_si_mem_base_reg() were called repeatly FOR_BB_INSNS from both
pass_shorten_memrefs::analyze and pass_shorten_memrefs::transform

Correct me if I'm wrong:
It seems we need some data structure (a linked list should work) to store the
zero/sign extend when we strip it off like:

if (GET_CODE (mem) == ZERO_EXTEND
  || GET_CODE (mem) == SIGN_EXTEND)
mem = XEXP (mem, 0);
in each insns.

Then in pass_shorten_memrefs::transform(), each time get_si_mem_base_reg() is
called, we check whether if we need to put it back.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-12 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #35 from Jim Wilson  ---
By combine issue, are you referring to the regression I mentioned in comment 3
and filed as bug 97747?  We can handle that as a separate issue.  It should be
uncommon.  I expect to get much more benefit from this patch than the downside
due to that combine issue.

As for the shorten-memrefs problem, I didn't notice that one in my testing.  It
does need to be fixed.  Taking a look now, it looks pretty simple to fix.  The
code currently looks for MEM, it needs to handle (SIGN_EXTEND (MEM)) and
((ZERO_EXTEND (MEM)).  See the get_si_mem_base_reg function.  You need to skip
over the sign_extend or zero_extend when looking fot the mem at the first call.
 Then at the second call you need to be careful to put the sign_extend or
zero_extend back when creating the new RTL.  Maybe just another arg to
get_si_mem_base so it can record the parent rtx code of the mem.  Or maybe do
this outside get_si_mem_base to skip over a sign/zero extend at the first call,
and then do the same at the second call but record what rtx we skipped over so
that we can put it back.  We can either handle this here or as another patch. 
But since you have some time while waiting for paperwork maybe you can try
writing a fix.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-11 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #34 from Levy  ---
(In reply to Jim Wilson from comment #33)
> I did say that I'm willing to fix code style issues.  All major software
> projects I have worked with have coding style conventions.  It makes it
> easier to maintain a large software base when everything is using the same
> coding style.  We do have info to help you follow the GNU coding style.  See
> https://gcc.gnu.org/wiki/FormattingCodeForGCC
> which has emacs and vim settings to get the right code style.

No problem and thank you, Jim, I'll try to catch up the coding style.
what about the combine issue and shorten-memrefs? 
Shall we fix it here or someplace else?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-11 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #33 from Jim Wilson  ---
I did say that I'm willing to fix code style issues.  All major software
projects I have worked with have coding style conventions.  It makes it easier
to maintain a large software base when everything is using the same coding
style.  We do have info to help you follow the GNU coding style.  See
https://gcc.gnu.org/wiki/FormattingCodeForGCC
which has emacs and vim settings to get the right code style.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49542|0   |1
is obsolete||

--- Comment #32 from Levy  ---
Created attachment 49543
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49543=edit
QI/HI/SImode auto Zero/Sign-extend

Added one missing space at the end of the comment.
Added one tab before each brace.
Replace all tabs with space (come on)

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49536|0   |1
is obsolete||

--- Comment #31 from Levy  ---
Created attachment 49542
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49542=edit
QI/HI/SImode auto Zero/Sign-extend

Much appreciated Jim.

The new patch should solve the format issue by adding the same tabs on each
line.

In the meanwhile I'll try to patch the pass_shorten_memrefs::analyze() in
riscv-shorten-memrefs.c

Any idea on the combiner issue?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #30 from Jim Wilson  ---
Looking at your v2 patch, the first verison fails because you are passing
mismatched modes to emit_move_insn.  The version with gen_lowpart solves that
problem, but fails because of infinite recursion.

The v4 patch looks good.  There are some coding style issues, but I can always
fix them for you.  There should be a space before open paren.  The first &&
should line up with the last two.  The braces should be indented two more
spaces.  The comment needs to end with a period and two spaces.

In the comment, instead of "Separate ... to ..." I'd say "Expand ... to ..." or
maybe "Emit ... as ...".

Now we need the copyright assignment paperwork.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49534|0   |1
is obsolete||

--- Comment #29 from Levy  ---
Created attachment 49536
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49536=edit
QI/HI/SImode auto Zero/Sign-extend

Finally, make gen_extend_insn() seems to work with auto-extension based on Jim
and Kito's idea.

Just 10 lines of code at the beginning of riscv_legitimize_move() in
riscv-gcc/gcc/config/riscv.c

if (GET_MODE_CLASS (mode) == MODE_INT
&& GET_MODE_SIZE (mode) < UNITS_PER_WORD
  && can_create_pseudo_p()
  && MEM_P (src))
  {
  rtx temp_reg = gen_reg_rtx (word_mode);
  int zero_sign_extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
  emit_insn(gen_extend_insn(temp_reg, src, word_mode, mode,
zero_sign_extend));
  riscv_emit_move(dest, gen_lowpart(mode, temp_reg));
  return true;
  }

Haven't make report-gcc, but already passed 2-stage make. 
I'll have a test later.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49533|0   |1
is obsolete||

--- Comment #28 from Levy  ---
Created attachment 49534
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49534=edit
V4 patch for QI/HI/SI/DI zero/sign-extend for RV32/64/128

Suggest by Kito, The V4 patch moved the gen_extend_insn_auto() to riscv.c and
was included in riscv-protos.h since it handles riscv backend only.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #27 from Levy  ---
(In reply to Kito Cheng from comment #25)
> Seem like you have add code to gcc/optabs.h and gcc/optabs.c, however those
> functions are RISC-V specific, so I would suggest you put in riscv.c and
> riscv-protos.h.

No problem, I'll make a new patch.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49532|0   |1
is obsolete||

--- Comment #26 from Levy  ---
Created attachment 49533
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49533=edit
QI/HI/SI/DI zero/sign-extend for RV32/64/128

BUG fix for unimplemented code

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #25 from Kito Cheng  ---
Seem like you have add code to gcc/optabs.h and gcc/optabs.c, however those
functions are RISC-V specific, so I would suggest you put in riscv.c and
riscv-protos.h.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49524|0   |1
is obsolete||

--- Comment #24 from Levy  ---
Created attachment 49532
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49532=edit
QI/HI/SI/DI zero/sign-extend for RV32/64/128

Rewrote the third proposed patch.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49470|0   |1
is obsolete||
  Attachment #49500|0   |1
is obsolete||

--- Comment #23 from Levy  ---
Created attachment 49524
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49524=edit
Third test patch

While I'm waiting for a solution, I've reused my second patch and made a new
patch.
Third test patch adds one extra function gen_extend_insn_auto() in optabs.c/h
then just called by riscv_legitimize_move()
Now it can emit sign/unsigned-extend insn automatically. 

Still haven't solved the shorten-memrefs issue. So it will still raise 2 error
when make report-gcc
So the -Os option (size optimization) may not behave as expected.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #22 from Levy  ---
Under condition 

if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))

with var:

rtx temp_reg;
int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);

I've tried the combination of:

gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend);
gen_extend_insn (temp_reg, force_reg (word_mode, src), word_mode, word_mode,
extend);
gen_extend_insn (temp_reg, src, word_mode, mode, extend);

with:
riscv_emit_move(dest, gen_lowpart (mode, temp_reg));
riscv_emit_move(dest, force_reg(mode, temp_reg));

then return true

All raised segfault during make gcc.

For example:

  if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))
  {
rtx temp_reg;
int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend);
riscv_emit_move(dest, force_reg(mode, temp_reg));
return true;
  }
At beginning of riscv_legitimize_move()

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #21 from Jim Wilson  ---
I submitted my testcase as 97747 so it will get more attention.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #20 from Jim Wilson  ---
Maybe convert_to_mode is recursively calling convert_to_mode until you run out
of stack space.  You can use --save-temps to generate a .i preprocessed file
form your input testcasxe, then run cc1 under gdb.  You need to give the
default arch/abi options or you will get an error
(gdb) run -march=rv64gc -mabi=lp64d -O2 tmp.i
when look at the backtrace when you hit the segfault.

There are other ways to get the  zero extend we need here.  We could just
generate the RTL for the insn directly instead of calling the
gen_zero_extendXiYi2 functions because we know that they exist and what form
they are in.  This is inconvenient though.  We could try querying the optabs
table to get the right insn code.  There is a gen_extend_insn function that
looks like it would work and is more direct than convert_to_mode. 
gen_extend_insn does require that the input is in a form that the convert will
accept, so it must be forced to a register if it isn't already a register. 
Another solution would be to use one of the rtx simplier functions, e.g. you
can do
 simplify_gen_unary (ZERO_EXTEND, word_mode, src, mode);
but the simplify functions are really for simplifying complex RTL to simpler
RTL, so I think not the right approach here.

I think gen_extend_insn is the best approach if we can't use convert_to_mode.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #19 from Levy  ---
Also tested code without int extend, just zero-extend with unsignedp set to 1,
Still seg fault.

if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))
  {
rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1));
riscv_emit_move(dest, temp_reg);
return true;
  }

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #18 from Levy  ---
if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))
  {
int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src,
extend));
riscv_emit_move(dest, temp_reg);
return true;
  }

tried to insert code at the beginning of riscv_legitimize_move() but seems
convert_to_mode() raised seg fault druing make

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #17 from Jim Wilson  ---
Yes, LOAD_EXTEND_OP is a good suggestion.  We can maybe do something like
  int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
  temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, extend));

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #16 from Kito Cheng  ---
> Or maybe we make the choice of zero-extend or sign-extend depend on the mode, 
> and use zero-extend for smaller than SImode and sign-extend for SImode and 
> larger.


Maybe depend on LOAD_EXTEND_OP?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #15 from Jim Wilson  ---
I tried testing your first patch.  I just built unpatched/patch
rv32-elf/rv64-linux toolchains and used nm/objdump to look at libraries like
libc, libstdc++, libm.

I managed to find a testcase from glibc that gives worse code with the patch.
struct
{
  unsigned int a : 1;
  unsigned int b : 1;
  unsigned int c : 1;
  unsigned int d : 1;
  unsigned int pad1 : 28;
} s;

void
sub (void)
{
  s.a = 1;
  s.c = 1;
}

Without the patch we get
sub:
lui a5,%hi(s)
addia5,a5,%lo(s)
lbu a4,1(a5)
ori a4,a4,5
sb  a4,1(a5)
ret
and with the patch we get
sub:
lui a4,%hi(s)
lbu a5,%lo(s)(a4)
andia5,a5,-6
ori a5,a5,5
sb  a5,%lo(s)(a4)
ret
Note the extra and instruction.

This seems to be a combine problem.  With the patched compiler, I see in the
-fdump-rtl-combine-all dump file
Trying 9 -> 11:
9: r79:DI=r78:DI&0xfffa
  REG_DEAD r78:DI
   11: r81:DI=r79:DI|0x5
  REG_DEAD r79:DI
Failed to match this instruction:
(set (reg:DI 81)
(ior:DI (and:DI (reg:DI 78 [ MEM  [(struct  *)] ])
(const_int 250 [0xfa]))
(const_int 5 [0x5])))
Combine knows that reg 78 only has 8 nonzero bits, so it simplified the AND -6
to AND 250.  If combine had left that constant alone, or if maybe combine
propagated that info about nonzero bits through to r81, then it should have
been able to optimize out the AND operation.  This does work when the load does
not zero extend by default.

The ARM port shows the exact same problem.  I see
sub:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr2, #:lower16:.LANCHOR0
movtr2, #:upper16:.LANCHOR0
ldrbr3, [r2]@ zero_extendqisi2
bic r3, r3, #5
orr r3, r3, #5
strbr3, [r2]
bx  lr
and the bic (bit clear) is obviously unnecessary.

This probably should be submitted as a separate bug if we don't want to fix it
here.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #14 from Jim Wilson  ---
Actually, for the SImode to DImode conversion, zero extending is unlikely to be
correct in that case.  The rv64 'w' instructions require the input to be
sign-extended from 32-bits to 64-bits, so a sign-extend is probably required. 
There will be a similar consideration for rv128 when we get to that.  So maybe
we should only handle QImode and HImode here.

Or maybe we make the choice of zero-extend or sign-extend depend on the mode,
and use zero-extend for smaller than SImode and sign-extend for SImode and
larger.

For qimode, char is unsigned by default, so zero extension is likely the right
choice.  For himode, it isn't clear which is best, but the arm port does a zero
extend.  Also, the Huawei code size proposal says that zero exnteded byte and
short loads are more important than sign extended byte and short load, so that
is more evidence that zero extend is more useful in those cases.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #13 from Jim Wilson  ---
The attachments show the entire riscv.c file being deleted and then readded
with your change.  This is due to a line ending problem.  The original file has
the unix style linefeed and the new file has the windows style carriage return
and linefeed.  Git has a setting called core.autocrlf that can help with this. 
This will require using git diff instead of just diff though to generate
patche, but should give better diffs.  Or alternatively, maybe to can force the
original file to have msdos line endings before you make the diff, e.g. maybe
just loading the original file into your editor and saving it without making
changes will fix the line endings.

You have in the second patch
(mode == QImode || mode == SImode || mode == HImode)
which is wrong but harmless for rv32 since we can't extend SImode, and is also
wrong for the eventual rv128 support.  You can fix this by using something like
(GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD
There is one place in riscv_legitimize_move that already uses this.

The code inside the if statement is a lot more verbose then necessary.  You can
use some helper functions to simplify it.  First you can use word_mode which is
DImode for rv64 and SImode for rv32 (and will be OImode for rv128).  Then you
can call a helper to do the mode conversion.  So for instance something like
  temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1));
should work.  That should emit an insn to do the zero extend and put it in a
reg.  Now you no longer need to check src mode or TARGET_64BIT as the code is
the same in all cases.  So it should just be about 3 or 4 lines of code for the
body of the if statement.

You have a check for REG_P (dest).  This is stricter than you need, since it
only works for REG and doesn't accept SUBREG.  We should handle both.  Also, it
isn't hard to also handle non-reg or non-subreg dests.  You just need to force
the source to a reg, and you already did that when you generated the
zero_extend operation.  So this code looks like it should work for any dest and
you should be able to drop the REG_P (dest) check.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-03 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #12 from Levy  ---
(In reply to Kito Cheng from comment #11)
> > Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c
> 
> Seems like shorten_memrefs pass didn't handle zero_extend and sign_extend
> with memory.
> 
> You can take a look into
> riscv-shorten-memrefs.c:pass_shorten_memrefs::analyze and add logic to
> handle zero_extend and sign_extend.
> 
> 
> > With one instruction less, the patched one seems right and even faster to 
> > me. However we still need a test on sign extend and check performance
> 
> shorten_memrefs is optimize for size, the idea is transform several load
> instructions with large immediate to a load immediate instruction and load
> instructions with small immediate, to use C-extensions instruction as
> possible.
> 
> so the instruction count seems increased, but the code size is smaller.

Thank you cheng, I'll have a look.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-03 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #11 from Kito Cheng  ---
> Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c

Seems like shorten_memrefs pass didn't handle zero_extend and sign_extend with
memory.

You can take a look into riscv-shorten-memrefs.c:pass_shorten_memrefs::analyze
and add logic to handle zero_extend and sign_extend.


> With one instruction less, the patched one seems right and even faster to me. 
> However we still need a test on sign extend and check performance

shorten_memrefs is optimize for size, the idea is transform several load
instructions with large immediate to a load immediate instruction and load
instructions with small immediate, to use C-extensions instruction as possible.

so the instruction count seems increased, but the code size is smaller.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-03 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #10 from Levy  ---
Created attachment 49500
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49500=edit
Optimzation Patch for QI/HImode(32bit) and QI/HI/SImode(64bit)

Proposing second patch for QI/HImode(32bit) and QI/HI/SImode(64bit)
Both Zero-Extend & Subreg

Tested with make report-gcc
Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c

Both were failed due to dejaGNU rule:
/* { dg-final { scan-assembler "load1r:\n\taddi" } } */

But if we look at the assembly code, for same input in both file:

int load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

Current gcc risc-v port will produce:
load1r:
addia5,a0,768
lw  a4,36(a5)
lw  a0,32(a5)
addwa0,a0,a4
lw  a4,40(a5)
addwa4,a4,a0
lw  a0,44(a5)
addwa0,a0,a4
ret
Patched new port will produce:
load1r:
lwu a4,800(a0)
lwu a5,804(a0)
addwa5,a5,a4
lwu a4,808(a0)
lwu a0,812(a0)
addwa5,a5,a4
addwa0,a5,a0
ret
With one instruction less, the patched one seems right and even faster to me.
However we still need a test on sign extend and check performance

Test case and performance evaluation will be provided later (hopefully)

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-30 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #9 from Levy  ---
Thanks Jim. See u on Monday.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-30 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #8 from Levy  ---
Created attachment 49470
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49470=edit
optimization fix for BUG 97417

proposing a temp patch here in case someone needs it, then I'll submit a full
patch with test case later.

Following code was added to the riscv_legitimize_move () in the
riscv-gcc/gcc/config/riscv/riscv.c

  if(mode == QImode && MEM_P (src) && REG_P (dest) && can_create_pseudo_p())
  {
rtx temp_reg;

if (TARGET_64BIT)
{
  temp_reg = gen_reg_rtx (DImode);
  emit_insn(gen_zero_extendqidi2(temp_reg, src));
}
else
{
  temp_reg = gen_reg_rtx (SImode);
  emit_insn(gen_zero_extendqisi2(temp_reg, src));
}

riscv_emit_move(dest, gen_lowpart(QImode,temp_reg));
return true;
  }

Tested with make report-gcc, haven't done the regression/performance test yet:

= Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |0 / 0 |0 / 0 |  - |

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-29 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #7 from Jim Wilson  ---
That patch is basically correct.  I would suggest using gen_lowpart instead of
gen_rtx_SUBREG as a minor cleanup.  It will do the same thing, and is shorter
and easier to read.

There is one problem here that you can't generate new pseudo registers during
register allocation, or after register allocation is complete.  So you need to
disable this optimization in this case.  You can do that by adding a check for
can_create_pseudo_p ().  This is already used explicitly in one place in
riscv_legitimize_move and implicitly in some subfunctions, and is used in the
arm.md movqi pattern.

The next part is testing the patch.  We need some correctness testing.  You can
just run the gcc testsuite for that.  And we need some code size/performance
testing.  I'd suggest compiling some code with and without the patch and check
function sizes and look for anything that got bigger with the patch, and check
to see if it is a problem.  I like to use the toolchain libraries like libc.a
and libstdc++.a since they are being built anways, but using a nice benchmark
is OK also as long as it is big enough to stress the compiler.

If the patch passes testing, then we can look at expanding the scope to handle
more modes, and also handle MEM dest in addition to REG dest.

Yes, we can discuss this Monday.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-28 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

 CC||admin at levyhsu dot com

--- Comment #6 from Levy  ---
Hi Jim

Levy from StarFive. 

Adding following code to the head of riscv_legitimize_move() according to your
reply seems to solve the problem:

if(mode == QImode && MEM_P (src) && REG_P (dest))
  {
rtx temp_reg;
if (TARGET_64BIT)
{
  temp_reg = gen_reg_rtx (DImode);
  emit_insn(gen_zero_extendqidi2(temp_reg, src));
}
else
{
  temp_reg = gen_reg_rtx (SImode);
  emit_insn(gen_zero_extendqisi2(temp_reg, src));
}

riscv_emit_move(dest, gen_rtx_SUBREG(QImode,temp_reg,0));
return true;
  }

same foo.c will produce:
foo:
lui a5,%hi(active)
lbu a5,%lo(active)(a5)
li  a0,42
bne a5,zero,.L6
ret
.L6:
li  a0,-42
ret
.size   foo, .-foo
.ident  "GCC: (GNU) 10.2.0"

Not sure if I'm doing it right, especially for 64bit DImode because I've only
been with gcc for a month. Just wonder if you have time after Monday's compiler
meeting so we may discuss movsi, movhi and MEM to MEM copy.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-27 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #5 from Jim Wilson  ---
Yes, the volatile is the problem.  We need to disable some optimizations like
the combiner to avoid breaking the semantics of volatile.  However, if you try
looking at other ports, like arm, you can see that they don't have this problem
because they generate different RTL at the start and hence do not need the
combiner to generate the sign-extended load.  So the proposal here is that we
modify the RISC-V gcc port to also emit the sign-extended load at RTL
generation time, which solves this problem. And then we need to do some testing
to make sure that this actually generates good code in every case, as we don't
want to accidentally introduce a code size or performance regression while
fixing this volatile optimization problem.

If you are curious about the combiner issue, see the init_recog_no_volatile
call in combine.c.  If you comment that out, the andi will be optimized away. 
But we can't remove that call, because that would break programs using
volatile.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-27 Thread jiawei at iscas dot ac.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #4 from jiawei  ---
I had did some tests with this problem and find:

foo.c

#include 

extern volatile bool active;

int foo(void)
{
  if (!active) {
return 42;
  } else {
return -42;
  }
}

code generated in foo.s

foo:
lui a5,%hi(active)
lbu a5,%lo(active)(a5)
li  a0,42
andia5,a5,0xff
beq a5,zero,.L2
li  a0,-42

When we remove the keyword `volatile`

foo_without_volatile.c

#include 

extern bool active;

int foo(void)
{
  if (!active) {
return 42;
  } else {
return -42;
  }
}

code generated in foo_without_volatile.s

foo:
lui a5,%hi(active)
lbu a5,%lo(active)(a5)
li  a0,42
beq a5,zero,.L2
li  a0,-42

and then we change the type from `bool` to `int`

foo_int.c

#include 

extern volatile int active;

int foo(void)
{
  if (!active) {
return 42;
  } else {
return -42;
  }
}

code generated in foo_int.s

foo:
lui a5,%hi(active)
lw  a5,%lo(active)(a5)
li  a0,42
sext.w  a5,a5
beq a5,zero,.L2
li  a0,-42

the `sext.w` instruction replace the `andi`

We also remove the keyword `volatile` in foo_int_without_volatile.c

#include 

extern int active;

int foo(void)
{
  if (!active) {
return 42;
  } else {
return -42;
  }
}

code generated in foo_int_without_volatile.s also look like optimized

foo:
lui a5,%hi(active)
lw  a5,%lo(active)(a5)
li  a0,42
beq a5,zero,.L2
li  a0,-42

Maybe this problem is due to the keyword `volatile`.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-20 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #3 from Jim Wilson  ---
The basic idea here is that the movqi pattern in riscv.md currently emits RTL
for a load that looks like this
  (set (reg:QI target) (mem:QI (address)))
As an experiment, we want to try changing that to something like this
  (set (reg:DI temp) (zero_extend:DI (mem:DI (address
  (set (reg:QI target) (subreg:QI (reg:DI temp) 0))
The hope is that the optimizer will combine the subreg with following
operations resulting in smaller faster code at the end.  And this should also
solve the volatile load optimization problem.  So we need a patch, and then we
need experiments to see if the patch actually produces better code on real
programs.  It should be fairly easy to write the patch even if you don't have
any gcc experience.  The testing part of this is probably more work than the
patch writing.

The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have to
be modified to look for qimode loads from memory, allocate a temporary
register, do a zero_extending load into the temp reg, and then a subreg copy
into the target register.

You will probably also need to handle cases where both the target and source
are memory locations, in which case this already gets split into two
instructions, a load followed by a store.

You can look at the movqi pattern in arm.md file to see an example of how to do
this, where it calls gen_zero_extendqisi2.  Though for RISC-V, we would want
gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32.

If the movqi change works, then we would want similar changes for movhi and
maybe also movsi for rv64.

It might also be worth checking whether zero-extend or sign-extend is the
better choice.  We zero extend char by default, so that should be best.  For
rv64, the hardware sign extends simode to dimode by default, so sign-extend is
probably best for that.  For himode I'm not sure, I think we prefer sign-extend
by default, but that isn't necessarily the best choice for loads.  This would
have to be tested.

You can see rtl dumps by using -fdump-rtl-all.  The combiner is the pass that
should be optimizing away the unnecessary zero-extend.  You can see details of
what the combiner pass is doing by using -fdump-rtl-combine-all.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-20 Thread jiawei at iscas dot ac.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

jiawei  changed:

   What|Removed |Added

 CC||jiawei at iscas dot ac.cn

--- Comment #2 from jiawei  ---
(In reply to Jim Wilson from comment #1)
> Comparing with the ARM port, I see that in the ARM port, the movqi pattern
> emits
> (insn 8 7 9 2 (set (reg:SI 117)
> (zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8])))
> "tmp.c\
> ":7:7 -1
>  (nil))
> (insn 9 8 10 2 (set (reg:QI 116)
> (subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1
>  (nil))
> and then later it combines the subreg operation with the following
> zero_extend and cancels them out.
> 
> Whereas in the RISC-V port, the movqi pattern emits
> (insn 9 7 10 2 (set (reg:QI 76)
>   (mem/v/c:QI (lo_sum:DI (reg:DI 74)
> (symbol_ref:DI ("active") [flags 0xc4]   0x7f9f0310312\
> 0 active>)) [1 active+0 S1 A8])) "tmp.c":7:7 -1
>  (nil))
> and then combine refuses to combine the following zero-extend with this insn
> as the memory operation is volatile.
> 
> So it seems we need to rewrite the RISC-V port to make movqi and movhi zero
> extend to si/di mode and then subreg.  That probably will require cascading
> changes to avoid code size and performance regressions.
> 
> Looks like a tractable small to medium size project, but will need to wait
> for a volunteer to work on it.

Hi Jim, My name is Jiawei Chen. I am from the PLCT Lab. I had recurrented this
bug, and want to try to help fixing this bug. What should I modify,is there any
suggestions?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-15 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-10-15
 Ever confirmed|0   |1

--- Comment #1 from Jim Wilson  ---
Comparing with the ARM port, I see that in the ARM port, the movqi pattern
emits
(insn 8 7 9 2 (set (reg:SI 117)
(zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8])))
"tmp.c\
":7:7 -1
 (nil))
(insn 9 8 10 2 (set (reg:QI 116)
(subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1
 (nil))
and then later it combines the subreg operation with the following zero_extend
and cancels them out.

Whereas in the RISC-V port, the movqi pattern emits
(insn 9 7 10 2 (set (reg:QI 76)
(mem/v/c:QI (lo_sum:DI (reg:DI 74)
(symbol_ref:DI ("active") [flags 0xc4]  )) [1 active+0 S1 A8])) "tmp.c":7:7 -1
 (nil))
and then combine refuses to combine the following zero-extend with this insn as
the memory operation is volatile.

So it seems we need to rewrite the RISC-V port to make movqi and movhi zero
extend to si/di mode and then subreg.  That probably will require cascading
changes to avoid code size and performance regressions.

Looks like a tractable small to medium size project, but will need to wait for
a volunteer to work on it.