[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-04 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #18 from Vineet Gupta  ---
(In reply to Richard Biener from comment #9)
> (In reply to Linus Torvalds from comment #8)
> > (In reply to Alexander Monakov from comment #7)
> > > 
> > > Most likely the issue is that sout/sfrom are misaligned at runtime, while
> > > the vectorized code somewhere relies on them being sufficiently aligned 
> > > for
> > > a 'short'.
> > 
> > They absolutely are.
> > 
> > And we build the kernel with -Wno-strict-aliasing exactly to make sure the
> > compiler doesn't think that "oh, I can make aliasing decisions based on type
> > information".
> > 
> > Because we have those kinds of issues all over, and we know which
> > architectures support unaligned loads etc, and all the tricks with
> > "memcpy()" and unions make for entirely unreadable code.
> > 
> > So please fix the aliasing logic to not be type-based when people explicitly
> > tell you not to do that.
> > 
> > Linus
> 
> Note alignment has nothing to do with strict-aliasing (-fno-strict-aliasing
> you mean btw).
> 
> One thing we do is (I'm not 50% sure this explains the observed issue) assume
> that if you have two accesses with type 'short' and they are aligned
> according to this type then they will not partly overlap.  Note this has
> nothing to do with C strict aliasing rules but is basic pointer math when
> you know lower zero bits.

OK, given that source code has type short, they will assume these things are
short aligned and thus won't overlap for short accesses. But then the code
actually generated by loop vectorizer assumes they are 8 bytes apart - since
that is what it is generating.


> 
> I suggest to try the fix suggested in comment#7 and report back if that
> fixes the observed issue.

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-03 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #15 from Vineet Gupta  ---
(In reply to Linus Torvalds from comment #14)
> (In reply to Vineet Gupta from comment #13)
> > Sorry the workaround proposed by Alexander doesn't seem to cure it (patch
> > attached), outcome is the same
> 
> Vineet - it's not the ldd/std that is necessarily buggy, it's the earlier
> tests of the address that guard that vectorized path. 
> 
> So your quoted parts of the code generation aren't necessarily the
> problematic ones.

/me slaps myself. How can I be so stupid.

> Did you actually test the code and check whether it has the same issue?
> Maybe it changed the address limit guards before that ldd/std?

The problem is is indeed gone. I need to analyze the assembly fully how it
prevents the bad case. e.g. I'm still not comfortable seeing the loop entered
with following and it doing 8 byte ldd/std when we know it should only do 2 at
a time.

r21 = 0xbf178036  (pre-increment so 0x3e will be first src)
r22 = 0xbf1780b2
LPC = 4

80d9a360:   lp  12  ;80d9a36c 
80d9a364:   ldd.a   r18r19,[r21,8]
80d9a368:   std.ab  r18r19,[r22,8]

> I also sent you a separate patch to test if just upgrading to a newer
> version of the zlib code helps. Although that may be buggy for other
> reasons, it's not like I actually tested the end result.. But it would be
> interesting to hear if that one works for you (again, ldd/std might be a
> valid end result of trying to vectorize that code assuming the aliasing
> tests are done correctly in the vectorized loop headers).

Thx for that. And this seems to boot as well.

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-03 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #13 from Vineet Gupta  ---
Sorry the workaround proposed by Alexander doesn't seem to cure it (patch
attached), outcome is the same

mov lp_count,r13;5  #, bnd.65
lp  @.L201  ; lp_count:@.L50->@.L201#,
.align 2
.L50:
# ../lib/zlib_inflate/inffast.c:288: PUP(sout) = PUP(sfrom);
  ldd.a r18,[r21,8] # MEM[base: _496, offset: 0B], MEM[base: _496, offset: 0B]

# ../lib/zlib_inflate/inffast.c:288:  PUP(sout) = PUP(sfrom);
  std.ab r18,[r22,8] # MEM[base: vectp_prephitmp.73_741, offset: 0B], MEM[base:
_496, offset: 0B]

.align 2
.L201:
; ZOL_END, begins @.L50 #

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-03 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #12 from Vineet Gupta  ---
Created attachment 50742
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50742&action=edit
kernel patch as proposed on comment #7

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-04-30 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #6 from Vineet Gupta  ---
(In reply to Linus Torvalds from comment #4)
> (In reply to Andrew Pinski from comment #1)
> > The loop gets vectorized, I don't see the problem really.
> 
> 
> See
> 
>
> https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/372
> 
> and in particular the comment
> 
>"In the first 8-byte copy, src and dst overlap"
> 
> so apparently gcc has decided that they can't overlap, despite the two
> pointers being literally generated from the same base pointer.

Exactly:

> But I don't real arc assembly, so I'll have to take Vineet's word for it.

fwiw:
LDD.a [base, off] is 8-byte load with pre-incr : eff addr = base + offset
STD.ab [base, off] is 8-byte store with post-incr: eff addr = base


> Vineet, have you been able to generate a smaller test-case?

No I'm afraid not.

[Bug middle-end/100363] gcc generating wider load/store than warranted at -O3

2021-04-30 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #3 from Vineet Gupta  ---
Created attachment 50723
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50723&action=edit
preprocessed source file (with extra nop annotation)

[Bug c/100363] New: gcc generating wider load/store than warranted at -O3

2021-04-30 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Bug ID: 100363
   Summary: gcc generating wider load/store than warranted at -O3
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vgupta at synopsys dot com
  Target Milestone: ---

Created attachment 50722
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50722&action=edit
test case with an additional nop to annotate codegen

In Linux kernel's initramfs gzip inflate code, an inner copy loop using
unsigned short pointers (src/dst) is generated with wider 8 or 16-byte at a
time (vs. 2 bytes at a time) causing extra/unintended bytes to be copied -
leading to corruption of inflated files on target.

The showed up on upstream v5.6 Linux kernel built for ARC (defaults to -O3).
Issue doesn't happen at -O2.

Full test case attached, but the gist of it is:

lib/zlib_inflate/inffast.c

if (dist > 2) {
unsigned short *sfrom;

sfrom = (unsigned short *)(from);
loops = len >> 1;
do
*sout++ = *sfrom++;

while (--loops);
out = (unsigned char *)sout;
from = (unsigned char *)sfrom;
}
...

@sfrom and @sout are unsigned short pointers and thus expected to work on 2
bytes. However at -O3 gcc is generating wide loads (8-byte LDD/STD on ARCv2,
16-byte LDR q0 on aarch64.

For aarch64, it seems there's code generated for 16-byte access as well as
2-byte, and I haven't verified if it elides the 16-byte code based on size etc
- but the code is generated nonetheless. For ARC 8-byte loop is certainly
executed causing bad things as described

The issue was originally seen with mainline gcc 10.2 (again both ARC and
aarch64) at -O3 and I can confirm it exists in gcc 9.3 as well.

Attaching preprocessed source file is from ARC linux build (but builds for
aarch64 too since non of arch specific functions are used here.

[Bug target/92846] [ARC] floating point compares not generating Invalid Operand

2021-04-30 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92846

Vineet Gupta  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
Summary|[ARC] gloating point|[ARC] floating point
   |compares not generating |compares not generating
   |Invalid Operand |Invalid Operand

--- Comment #4 from Vineet Gupta  ---
Resolved via

2019-12-12 fbf8314b0a8d [ARC] generate signaling FDCMPF for hard float
comparisons

[Bug target/92845] [ARC] gcc not generating hardware compare instruction FDCMP for -mcpu=hs38_linux

2021-04-30 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92845

Vineet Gupta  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Vineet Gupta  ---
Addressed via 

2019-12-12 48f13fb118fe [ARC] Use hardware support for double-precision compare
instructions.

[Bug target/89838] [ARC] ICE building glibc testsuite

2021-04-30 Thread vgupta at synopsys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89838

Vineet Gupta  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Vineet Gupta  ---
closing per fix pointed by Claudiu !

[Bug middle-end/95115] RISC-V 64: inf/inf division optimized out, invalid operation not raised

2020-07-02 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115

Vineet Gupta  changed:

   What|Removed |Added

 CC||claziss at gmail dot com,
   ||vgupta at synopsys dot com

--- Comment #7 from Vineet Gupta  ---
FWIW this is also applicable to ARC glibc + upstream gcc 10

https://sourceware.org/pipermail/libc-alpha/2020-July/115679.html
https://sourceware.org/pipermail/libc-alpha/2020-July/115694.html

[Bug target/52451] gcc w/i387 float generates fucom rather than fcom for floating point comparsons

2019-12-06 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52451

--- Comment #12 from Vineet Gupta  ---
oops the ARC bug is (PR 92846) not (PR 92845)

[Bug target/52451] gcc w/i387 float generates fucom rather than fcom for floating point comparsons

2019-12-06 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52451

Vineet Gupta  changed:

   What|Removed |Added

 CC||vgupta at synopsys dot com

--- Comment #11 from Vineet Gupta  ---
ARC gcc backend suffers from this and I've created
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92845 and also a tentative fix
which passes gcc.dg/torture/pr52451.c 

However I'm curious that the test uses qNaN. What is the expected behavior for
sNaN. If you tweak the testcase  slightly as follows:

diff --git a/gcc/testsuite/gcc.dg/torture/pr52451.c
b/gcc/testsuite/gcc.dg/torture/pr52451.c

-  volatile TYPE nan##S = __builtin_nan##S ("");\
+  volatile TYPE nan##S = __builtin_nans##S ("");   \

With that even on ARM (RPI3) it now fails for the "quite" C operations "==" and
"!=" and isless(),isgreater(),islessequal(),isgreaterequal().

Is that expected, OK ? I guess there's no easy to fix this unless hardware
supports the 3 varaints or glibc code has a way to clear exception in certain
cases.

[Bug target/92846] [ARC] gloating point compares not generating Invalid Operand

2019-12-06 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92846

--- Comment #2 from Vineet Gupta  ---
Created attachment 47438
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47438&action=edit
proposed fix

Ran full glibc tessuite with this: No regressions
gcc dejagnu test pr52451.c passes too

[Bug target/92846] [ARC] gloating point compares not generating Invalid Operand

2019-12-06 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92846

--- Comment #1 from Vineet Gupta  ---
Test case:

int f(double x, double y)
{
return x > y;  // expected FDCMPF (qNaN, sNaN)
}

int f2(double x, double y)
{
return __builtin_isgreater(x, y);  // expected FDCMP (only sNan)
}

[Bug target/92846] New: [ARC] gloating point compares not generating Invalid Operand

2019-12-06 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92846

Bug ID: 92846
   Summary: [ARC] gloating point compares not generating Invalid
Operand
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vgupta at synopsys dot com
CC: claziss at gcc dot gnu.org, vgupta at synopsys dot com
  Target Milestone: ---

This is similar to issues reported for other targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52451
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77918

The FDCMP instructions generated raised Invalid operand only for sNaN. This
causes glibc testsuite failures test-{double,float,...}-iseqsig which test both
qNaN and sNaN and expect either to pass.

[Bug target/92845] New: [ARC] gcc not generating hardware compare instruction FDCMP for -mcpu=hs38_linux

2019-12-06 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92845

Bug ID: 92845
   Summary: [ARC] gcc not generating hardware compare instruction
FDCMP for -mcpu=hs38_linux
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vgupta at synopsys dot com
CC: claziss at gcc dot gnu.org, claziss at gmail dot com
  Target Milestone: ---

Test case:

int f(double x, double y)
{
return x > y;
}

arc-linux-gcc -mcpu=hs38_linux -c -O2 --save-temps

push_s blink
bl @__gtdf2;1
setgt r0, r0, 0
pop_s blink
j_s [blink]

[Bug target/89838] [ARC] ICE building glibc testsuite

2019-12-06 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89838

--- Comment #3 from Vineet Gupta  ---
Can this be closed ?

[Bug target/89877] [ARC] miscompilation due to missing cc clobber in longlong.h: add_ssaaaa()/sub_ddmmss()

2019-04-04 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89877

Vineet Gupta  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Vineet Gupta  ---
Marking as fixed, please backport to gcc-8-stable !

Thx,
-Vineet

[Bug target/89877] [ARC] miscompilation due to missing cc clobber in longlong.h: add_ssaaaa()/sub_ddmmss()

2019-03-28 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89877

Vineet Gupta  changed:

   What|Removed |Added

 CC||vgupta at synopsys dot com

--- Comment #1 from Vineet Gupta  ---
Created attachment 46052
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46052&action=edit
proposed fix

patch to fix the issue.

[Bug target/89877] New: [ARC] miscompilation due to missing cc clobber in longlong.h: add_ssaaaa()/sub_ddmmss()

2019-03-28 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89877

Bug ID: 89877
   Summary: [ARC] miscompilation due to missing cc clobber in
longlong.h: add_ss()/sub_ddmmss()
   Product: gcc
   Version: 8.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vgupta at synopsys dot com
  Target Milestone: ---

Created attachment 46051
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46051&action=edit
test case, build with -O2 to show issue

A glibc build with -mcpu=hs4x sowed weird print values for test case below
(originally showed in multibench test harness printing wrong values)

void main(int argc, char *argv[])
{
size_t total_time = 115424
double secs = (double)total_time/(double)1000;
printf("%s %d %lf\n", "secs", total_time, secs);  // prints 113.504
printf("%d\n", (size_t)secs);
}

The code path leads to glibc stdlib/divrem.c: __mpn_divrem() which in turn uses
target defined inline asm macros in stdlib/longlong.h (which in turns is
sync'ed from gcc include/longlong.h)

These inline macros clobber the cpu flags, but fail to add "cc" in clobber
list.
This causes gcc to schedule a flag setting CMP instruction (or ADD.f) before
the clobbering ADD.f/SUB.f instructions, causing a subsequent conditional
branch to use a stale flag.

__mpn_divrem:
...
.L135:
...
st-1,[r0]
cmp_s r10,-1<-- intended flag
sub   r0,r0,4
sub   r4,r2,r9
add.f r2, r18, r9   <-- clobbered
adc   r3, r4, 0
beq_s @.L72 <-- stale flag used

-mcpu=hs4x + cc clobber fix
---
st-1,[r0]
sub   r4,r2,r9
sub   r0,r0,4
add.f r2, r18, r9
adc   r3, r4, 0
cmp_s r10,-1<-- intended flag
beq_s @.L72 <-- right flag used

The issue doesn't happen with default -mpcu=hs38 as the instruction scheduling
already delays the CMP for some reason.

-mcpu=hs38
--
st-1,[r0]
sub   r4,r2,r9
sub   r0,r0,4
add.f r2, r18, r9
adc   r3, r4, 0
cmp_s r10,-1
beq_s @.L72

[Bug target/89838] New: [ARC] ICE building glibc testsuite

2019-03-26 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89838

Bug ID: 89838
   Summary: [ARC] ICE building glibc testsuite
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vgupta at synopsys dot com
  Target Milestone: ---

Created attachment 46028
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46028&action=edit
test case

An ICE was triggered when building glibc testsuite (after a glibc rebase)
against upstream gcc-8-branch (as of commit below)

2019-01-29 0d5ba57508c5 Backport ARC patches.


Triggering source code file attached.

arc-buildroot-linux-gnu-gcc -O2 -c tst-tls1.i

[Bug rtl-optimization/88001] ASMCONS cannot handle properly UNSPEC(CONST)

2018-12-14 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88001

--- Comment #11 from Vineet Gupta  ---
Sure, but how can I ? if i click the "known to work" field it takes me to help.

The issue certainly with gcc-8-branch for ARC and presumably also with
tip/trunk.

[Bug rtl-optimization/88001] ASMCONS cannot handle properly UNSPEC(CONST)

2018-12-14 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88001

--- Comment #9 from Vineet Gupta  ---
Can this be stable backported to gcc-8-branch as well.
glibc folks use that branch for their regular smoke testing and without that
ARC tools don't even build.

[Bug rtl-optimization/88001] ASMCONS cannot handle properly UNSPEC(CONST)

2018-12-11 Thread vgupta at synopsys dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88001

Vineet Gupta  changed:

   What|Removed |Added

 CC||vgupta at synopsys dot com

--- Comment #5 from Vineet Gupta  ---
Hi, I'm the ARC glibc maintainer (and desperate for this fix).

The proposed (one liner) patch seems to fix the compiler assert for sue. I'm
still running more tests to make sure it is functional but looks promising.