[Bug c++/90029] optimizing local exceptions, or are they an observable side effect

2019-04-09 Thread federico.kircheis at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90029

--- Comment #2 from Federico Kircheis  ---
Thank you for your answer, I need to learn better how to search for related
bugs.

The bugs you linked do surely answer my question, but they do not cover exactly
the same requests.

1) optimize dead call to `std::terminate`.


2) remove typeinfo information for types that can never escape the local scope 

The other tickets are more generic, in the general case it will not be possible
to remove the typeinfo by just looking at a function.
In my example `foo` is defined, thrown and captured inside `bar_ex*`, which is
not the common case when using exception (normally the error class is defined
globally or in a namespace)


Should I maybe open a feature request with those two missed optimization?
They are related to optimizing exceptions, but not completely dependent (I
think at least).

[Bug c/90034] gcc hangs on wait4 after vfork after opening tmp file

2019-04-09 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90034

--- Comment #1 from Andrew Pinski  ---
wait4 is waiting for child process to finish.  You need to do strace with -f
option to follow the forks.

[Bug c/90034] New: gcc hangs on wait4 after vfork after opening tmp file

2019-04-09 Thread todd.freed at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90034

Bug ID: 90034
   Summary: gcc hangs on wait4 after vfork after opening tmp file
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: todd.freed at gmail dot com
  Target Milestone: ---

Created attachment 46123
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46123=edit
causes the hang

Here is the command:

> dd if=bug_input.c bs=1 count=3051 2>/dev/null | gcc -xc - # hangs

The hang seems to be specific to this input somehow.

If I pass any count < 3051, it does not hang. For any count >= 3051, it hangs.

If I pass an offset of any kind, it does not hang.

If I take those 3050 bytes and put them in a file instead of passing then to
stdin via pipe, it still hangs.

> gcc -bug_input_0_3050.c # hangs

I've attached the entire file anyway, for context. The file was generated by
GNU bison.

---

strace snippet from the hang

 . . .
getpid()= 16236
openat(AT_FDCWD, "/tmp/ccPYKe6J.s", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
close(3)= 0
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/8.2.1/cc1", {st_mode=S_IFREG|0755,
st_size=26001000, ...}) = 0
access("/usr/lib/gcc/x86_64-pc-linux-gnu/8.2.1/cc1", X_OK) = 0
vfork() = 16237
wait4(16237, 

---

strace snippet from a non-hang

 . . .
getpid()= 16578
openat(AT_FDCWD, "/tmp/ccBrWed1.s", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
close(3)= 0
stat("/usr/lib/gcc/x86_64-pc-linux-gnu/8.2.1/cc1", {st_mode=S_IFREG|0755,
st_size=26001000, ...}) = 0
access("/usr/lib/gcc/x86_64-pc-linux-gnu/8.2.1/cc1", X_OK) = 0
vfork() = 16579
wait4(16579, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16579
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16579, si_uid=1000,
si_status=0, si_utime=0, si_stime=0} ---
getpid()= 16578
openat(AT_FDCWD, "/tmp/ccXHF56n.o", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
close(3)= 0
 . . .

---

Version Info:

todd@euclid ~/bison
0 master % uname -a
Linux euclid 5.0.4-arch1-1-ARCH #1 SMP PREEMPT Sat Mar 23 21:00:33 UTC 2019
x86_64 GNU/Linux

todd@euclid ~/bison
0 master % pacman -Q gcc
gcc 8.2.1+20181127-1

todd@euclid ~/bison
0 master % gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/8.2.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib
--libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --enable-libmpx --with-system-zlib --with-isl
--enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu
--disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object
--enable-linker-build-id --enable-lto --enable-plugin
--enable-install-libiberty --with-linker-hash-style=gnu
--enable-gnu-indirect-function --enable-multilib --disable-werror
--enable-checking=release --enable-default-pie --enable-default-ssp
--enable-cet=auto
Thread model: posix
gcc version 8.2.1 20181127 (GCC)


NOTE : This also repro's on another of my dev machines, with gcc version 8.2.1
20180831 (GCC)

[Bug c/448] -related issues (C99 issues)

2019-04-09 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=448

Eric Gallager  changed:

   What|Removed |Added

 Target|netbsd, SymbianOS, LynxOS,  |SymbianOS, LynxOS, QNX, TPF
   |QNX, TPF|

--- Comment #45 from Eric Gallager  ---
(In reply to coypu from comment #44)
> (In reply to jos...@codesourcery.com from comment #31)
> > GCC: some NetBSD targets (netbsd-stdint.h only used for x86 / x86_64), 
> 
> Speaking for NetBSD only:
> as of https://gcc.gnu.org/viewcvs/gcc?view=revision=253323 , we
> include netbsd-stdint.h for all netbsd targets.

OK, removing it from the target list then.

[Bug c/90027] misalign variable access by piece load/store even when define STRICT_ALIGNMENT nonzero

2019-04-09 Thread zhongyunde at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90027

--- Comment #2 from vfdff  ---
for deja testcase: gcc.c-torture/execute/20010518-2.c 
as the struct a_struct define with __attribute__ ((packed)), so the member
variable b also not aligned with 4 bytes, is this case undefined behavior ?


typedef struct
{
  short a;
  long b;  /* Will not aligned with 4 bytes */
  short c;
  short d;
} __attribute__ ((packed)) a_struct;


int
main(void)
{
  volatile a_struct *a;
  volatile a_struct b;

  a = 
  *a = (a_struct){1,2,3,4};


  if (a->a != 1 || a->b != 2 || a->c != 3 || a->d != 4)
abort ();

  exit (0);
}

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread fink at snaggledworks dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

fink at snaggledworks dot com changed:

   What|Removed |Added

 CC||fink at snaggledworks dot com

--- Comment #52 from fink at snaggledworks dot com ---
(In reply to Iain Sandoe from comment #43)
> Created attachment 46110 [details]
> Proof-of-principle path
> 
> Does this work for you?
>  - my local testing says it generates the right wrapped include file.
> 
> (perhaps the constraint on darwin version was too tight in Erik's case)

I applied this patch to the 8.3.0 source as built using the Fink gcc8 package
(with minor tweaks so it would apply cleanly), and gcc-8.3.0 now built fine on
macOS 10.14.4 with Xcode10.2.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-09 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #25 from Steve Kargl  ---
On Tue, Apr 09, 2019 at 08:24:29PM +, redi at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991
> 
> --- Comment #24 from Jonathan Wakely  ---
> Thanks for the patch, I'll test it fully tomorrow.
> 

I think the patch for complex_sqrt() is correct.  The one
for complex_pow(), I think accidently works for OP, but is
likely broken for some general regions of the complex plane.

> I'll open a separate bug for the FreeBSD issue. We could use more fine-grained
> configure checks so that most C99 math functions are enabled, even if some of
> the complex ones are missing.

libgfortran has c99_functions.c that implements missing C99 math
functions when configure cannot find one.  The implementations 
are likely to be fairly direct without much optimization,
worrying about exceptional casea, or even tested extensively.

[Bug c++/90033] New: [concepts] ICE segfault evaluating a requires clause that transitively depends on itself

2019-04-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90033

Bug ID: 90033
   Summary: [concepts] ICE segfault evaluating a requires clause
that transitively depends on itself
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/SEfFol

This is a creduce'd example that tripped a segfault in our Real World Code
implementation of unique_function. The RWC version includes a requirement that
X != Y so this can never be a copy or move constructor, but that was removed in
the reduction. FWIW, The clang concepts fork compiles this successfully.

template 
struct bool_constant { static constexpr bool value = B; };
template 
struct is_constructible : bool_constant<__is_constructible(T, Args...)> {};
template 
T&& move(T&);

struct X {
  template 
  requires(is_constructible::value)
  X(OtherFunc &&);

  X() = default;
};

X source;
X dest = move(source);

-

: In substitution of 'template  requires 
is_constructible::value X::X(OtherFunc&&) [with OtherFunc
= X]':
:16:21:   required from here
:16:21: internal compiler error: Segmentation fault
   16 | X dest = move(source);
  | ^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.


I know concepts are still experimental, but if the fix turns out to be simple,
we'd appreciate a backport to gcc8.

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-09 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763

--- Comment #44 from Segher Boessenkool  ---
(In reply to Jeffrey A. Law from comment #43)
> The problem with your suggestions Segher is that we'd have to do them for
> every target which defines insns with a zero_extract destination and that's
> been the well understood way to handle this stuff for over 2 decades.

It has only worked in some cases and not in others, for all of those decades.
And what cases those are exactly changes with the phase of the moon, well, with
any otherwise irrelevant change.

This is part of the reason why rs6000 doesn't have insv patterns any more,
btw (since r226005).  (The other part is that our rl*imi insns can only in
very limited cases be described with insv).

> Improving combine avoids that problem.

Sure, but combine just gives up for RMW insns in many cases (and it has to).
Some other passes do the same thing, I would think?  Using the same pseudo
for two things causes problems.

> Of course we have to balance the
> pros/cons of any patch in that space as well which is hard to do without an
> official patch to evaluate.  What I've got is just proof of concept for the
> most common case, but it does show some promise.

Oh, I'm not against any such patch /per se/, if it is safe and suitable for
stage 4, and an improvement (not a regression for some targets), I'll okay
it of course.  

> Also note that Steve's patch just addresses combine_bfi IIUC.  My POC
> addresses insv_?.c as well as the existing combine_bfi test (but I haven't
> tested it against the deeper tests in Steve's patch.

[Bug rtl-optimization/90032] [MSP430] reload uses wrong stack slot for variable after setjmp/longjmp

2019-04-09 Thread jozefl.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90032

--- Comment #4 from Jozef Lawrynowicz  ---
Created attachment 46122
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46122=edit
tester.s

[Bug rtl-optimization/90032] [MSP430] reload uses wrong stack slot for variable after setjmp/longjmp

2019-04-09 Thread jozefl.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90032

--- Comment #3 from Jozef Lawrynowicz  ---
Created attachment 46121
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46121=edit
tester.i reload dump

[Bug rtl-optimization/90032] [MSP430] reload uses wrong stack slot for variable after setjmp/longjmp

2019-04-09 Thread jozefl.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90032

--- Comment #2 from Jozef Lawrynowicz  ---
Created attachment 46120
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46120=edit
tester.i ira dump

[Bug rtl-optimization/90032] [MSP430] reload uses wrong stack slot for variable after setjmp/longjmp

2019-04-09 Thread jozefl.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90032

Jozef Lawrynowicz  changed:

   What|Removed |Added

  Attachment #46118|0   |1
is obsolete||

--- Comment #1 from Jozef Lawrynowicz  ---
Created attachment 46119
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46119=edit
tester.i

[Bug rtl-optimization/90032] New: [MSP430] reload uses wrong stack slot for variable after setjmp/longjmp

2019-04-09 Thread jozefl.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90032

Bug ID: 90032
   Summary: [MSP430] reload uses wrong stack slot for variable
after setjmp/longjmp
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jozefl.gcc at gmail dot com
  Target Milestone: ---

Created attachment 46118
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46118=edit
tester.i

gcc.dg/torture/stackalign/setjmp-1.c fails at execution for msp430-elf at -O1.
An RTL optimization added in r255136 exposed the failure, but if I make this
optimization in the source of the test, then versions of GCC before this also
fail, back to GCC 7. It passes with gcc-6.4, but again that is just because the
RTL at reload is different.

I've attached a reduced testcase (tester.i) based on setjmp-1.c.
> msp430-elf-gcc -O1 -msim tester.i
The failure occurs because the wrong stack slot is used as the first argument
to strcmp.
R1 is the stack pointer, R12 stores the first argument to functions.
>   MOV.W   R12, 20(R1); The address of the string is stored in 20(R1)
>   MOV.W   #25972, @R12   ; "te"
>   MOV.W   #29811, 2(R12) ; "st"
> (No other modifications of R1)
> preparing jmp_buf
>   CALL#sub2
> Label for return from longjmp
>   MOV.W   @R1, R12  ; The address of the string is actually in 20(R1)
>   CALL#strcmp
Whether the test fails or not seems gated on if frame_pointer_needed == true
for main(). When there is "more" RTL code (i.e. before the revisions that added
the problematic optimizations), then frame_pointer_needed == true so the
address of the "test" string will be used as an offset from the frame pointer,
instead of the stack pointer.

TARGET_FRAME_POINTER_REQUIRED () always returns false for msp430, but if I make
it return true if
  (cfun->has_nonlocal_label || cfun->calls_setjmp)
then the test passes and the following code is generated.

The frame pointer is R4, but it is not fixed.
>   MOV.W   R12, -2(R4)
>   MOV.W   #25972, @R12
>   MOV.W   #29811, 2(R12)
>  (No other modifications of R4)
>   CALL#sub2
>  Label for return from longjmp
>   MOV.W   -2(R4), R12  ; -2(R4) contains the correct address of "test"

I've attached the assembly file, IRA and reload dumps for tester.i when
compiled with current trunk.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #24 from Jonathan Wakely  ---
Thanks for the patch, I'll test it fully tomorrow.

I'll open a separate bug for the FreeBSD issue. We could use more fine-grained
configure checks so that most C99 math functions are enabled, even if some of
the complex ones are missing.

[Bug c/90027] misalign variable access by piece load/store even when define STRICT_ALIGNMENT nonzero

2019-04-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90027

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||ebotcazou at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Eric Botcazou  ---
> A test cases:
> int foo ()
> {
>volatile int *pswData = 2;  /* point to address 2 is not aligned with 4
> bytes.  */
>return *pswData - *pswData;
> }

That precisely makes it undefined behavior, see 6.5.3.2:

"The unary * operator denotes indirection. If the operand points to a function,
the result is a function designator; if it points to an object, the result is
an lvalue designating the object. If the operand has type ‘‘pointer to type’’,
the result has type ‘‘type’’. If an invalid value has been assigned to the
pointer, the behavior of the unary * operator is undefined(102)."

"(102) Among the invalid values for dereferencing a pointer by the unary *
operator are a null pointer, an address inappropriately aligned for the type of
object pointed to, and the address of an object after the end of its lifetime."

[Bug libstdc++/89851] [9 Regression] std::variant comparison operators violate [variant.relops]

2019-04-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89851

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jonathan Wakely  ---
Done

[Bug c++/90031] New: Bogus parse error trying to explicitly specialize a template variable inside class scope

2019-04-09 Thread redbeard0531 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90031

Bug ID: 90031
   Summary: Bogus parse error trying to explicitly specialize a
template variable inside class scope
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/y5GQZd

struct Struct {
template 
constexpr static bool use_cond = false;
template 
constexpr static bool use_cond = true;
};

:5:27: error: explicit template argument list not allowed
5 | constexpr static bool use_cond = true;
  |   ^

[Bug libstdc++/89851] [9 Regression] std::variant comparison operators violate [variant.relops]

2019-04-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89851

Jonathan Wakely  changed:

   What|Removed |Added

   Priority|P1  |P3
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-04-09
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Jonathan Wakely  ---
The regression was introduced with r269422 and fixed with r270056.

I'll add the testcase to the testsuite and close this.

[Bug target/89093] [9 Regression] C++ exception handling clobbers d8 VFP register

2019-04-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89093

--- Comment #35 from Jonathan Wakely  ---
(In reply to Bernd Edlinger from comment #33)
> (In reply to Ramana Radhakrishnan from comment #32)
> > 
> > Either I drop the warning or I keep the hunk in eh_personality.cc - any
> > preferences / thoughts ?
> 
> It would feel safer, if only the functions that need it
> had a target attribute like:
> 
> _Unwind_Reason_Code
> #ifdef __ARM_EABI_UNWINDER__
> __attribute__((target("general-regs-only")))
> PERSONALITY_FUNCTION (_Unwind_State state,
>   struct _Unwind_Exception* ue_header,
>   struct _Unwind_Context* context)

Agreed - will this work instead?

[Bug d/88150] Use sections_elf_shared.d on Solaris

2019-04-09 Thread ro at CeBiTec dot Uni-Bielefeld.DE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88150

--- Comment #12 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
I've now reworked my non-dlpi_tls_modid patch to include this after
Solaris 11.[345]/x86 testing gave excellent and pretty much identical
test results:

https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00354.html

[Bug libstdc++/90008] [9 Regression] variant attempts to copy rhs in comparison operators

2019-04-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90008

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jonathan Wakely  ---
Fixed for GCC 9.1

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #51 from Iain Sandoe  ---
(In reply to Jürgen Reuter from comment #50)
> (In reply to Jakub Jelinek from comment #48)
> > Perhaps that redefinition of _Atomic should be guarded with
> > #if (__STDC_VERSION__ < 201112L) || defined(__cplusplus)
> > or so, so that for C -std=c11 you still get _Atomic?
> 
> So shall I wait for this to test the fix?

no, please check that we have a basic fix, we can polish it once we know more
about the situation and the possible way forward.

[Bug c++/90029] optimizing local exceptions, or are they an observable side effect

2019-04-09 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90029

Martin Sebor  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |RESOLVED
 CC||msebor at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=64501
 Resolution|--- |DUPLICATE
   Severity|normal  |enhancement

--- Comment #1 from Martin Sebor  ---
Exceptions whose effects aren't observable could be optimized away.  I think
this is also what pr53294 requests.  See also pr64501.

*** This bug has been marked as a duplicate of bug 53294 ***

[Bug c++/53294] Optimize out some exception code

2019-04-09 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53294

Martin Sebor  changed:

   What|Removed |Added

 CC||federico.kircheis at gmail dot 
com

--- Comment #4 from Martin Sebor  ---
*** Bug 90029 has been marked as a duplicate of this bug. ***

[Bug target/90028] On Intel Skylake (-march=native) generated avx512 instruction can be wrong

2019-04-09 Thread ferruh.yigit at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028

--- Comment #5 from Ferruh YIGIT  ---
Tested with latest gcc [1], same output.

[1] Compiled from source:
gcc (GCC) 9.0.1 20190409 (experimental)

[Bug libstdc++/90008] [9 Regression] variant attempts to copy rhs in comparison operators

2019-04-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90008

--- Comment #2 from Jonathan Wakely  ---
Author: redi
Date: Tue Apr  9 18:50:39 2019
New Revision: 270236

URL: https://gcc.gnu.org/viewcvs?rev=270236=gcc=rev
Log:
PR libstdc++/90008 remove unused capture from variant rel ops

PR libstdc++/90008
* include/std/variant (_VARIANT_RELATION_FUNCTION_TEMPLATE): Remove
unused capture.
* testsuite/20_util/variant/90008.cc: New test.

Added:
trunk/libstdc++-v3/testsuite/20_util/variant/90008.cc
Modified:
trunk/libstdc++-v3/ChangeLog
trunk/libstdc++-v3/include/std/variant

[Bug fortran/90030] New: Fortran OpenACC subarray data alignment

2019-04-09 Thread tschwinge at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90030

Bug ID: 90030
   Summary: Fortran OpenACC subarray data alignment
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: openacc
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

As reported by Cesar in
, and later
re-submitted in .

> In both OpenACC and OpenMP, each subarray has at least two data mappings
> associated with them, one for the pointer and another for the data in
> the array section (fortan also has a pset mapping). One problem I
> observed in fortran is that array section data is casted to char *.
> Consequently, when lower_omp_target assigns alignment for the subarray
> data, it does so incorrectly. This is a problem on nvptx if you have a
> data clause such as
> 
>   integer foo
>   real*8 bar (100)
>   
>   !$acc data copy (foo, bar(1:100))
> 
> Here, the data associated with bar could get aligned on a 4 byte
> boundary instead of 8 byte. That causes problems on nvptx targets.
> 
> My fix for this is to prevent the fortran front end from casting the
> data pointers to char *. I only prevented casting on the code which
> handles OMP_CLAUSE_MAP. The subarrays associated with OMP_CLAUSE_SHARED
> also get casted to char *, but I left those as-is because I'm not that
> familiar with how non-OpenMP target regions get lowered.

[Bug target/90028] On Intel Skylake (-march=native) generated avx512 instruction can be wrong

2019-04-09 Thread ferruh.yigit at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028

--- Comment #4 from Ferruh YIGIT  ---
Created attachment 46117
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46117=edit
.s file generated by "--save-temps" param

[Bug target/90028] On Intel Skylake (-march=native) generated avx512 instruction can be wrong

2019-04-09 Thread ferruh.yigit at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028

--- Comment #3 from Ferruh YIGIT  ---
Created attachment 46116
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46116=edit
.i file generated by "--save-temps" param

[Bug c++/90029] New: optimizing local exceptions, or are they an observable side effect

2019-04-09 Thread federico.kircheis at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90029

Bug ID: 90029
   Summary: optimizing local exceptions, or are they an observable
side effect
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: federico.kircheis at gmail dot com
  Target Milestone: ---

Hello, 

this is partly a question, and partly a feature request

Consider following functions


int bar_ex_noexcept(int i) noexcept {
struct foo{};
try {
if(i<0){
throw foo{};
}
return i;
}catch(foo){
return -1;
}
}

int bar_ex(int i) {
struct foo{};
try {
if(i<0){
throw foo{};
}
return i;
}catch(foo){
return -1;
}
}

int bar_ret(int i) noexcept {
if(i<0){
return -1;
}
return i;
}

int bar_goto(int i) noexcept {
if(i<0){
goto ex;
}
return i;
ex: return -1;
}


All this functions, unless I overlooked something, do exactly the same with
different control structure (goto, exception, early return): return the given
value i positive or 0, -1 if less than 0.
gcc is smart, and all functions, except those that use as implementation detail
an exception, generate the same assembly with (testing with `-O3`).
`bar_ex_noexcept` also has a call to `std::terminate`, even if it will never
get executed (example here: https://godbolt.org/z/XVtgXG).
Since `foo` is defined locally to `bar_ex`/`bar_ex_noexcept`, I expected that
gcc would have been able to optimize the throw and catch clause completely
(calls to `__cxa_allocate_exception` and `__cxa_throw`), but it's not the case.
Also the conditional call to `std::terminate` in `bar_ex_noexcept` surprised
me, since the only thrown exception is always caught and ignored.

Therefore my question: are exceptions an observable behavior, and thus the
compiler can't optimize them completely away, or is simply gcc "not smart
enough" to do those optimization.

If they are not observable, the feature request would be to
 * optimize local exceptions
 * thus remove typeinfo information for types that can never escape the local
scope (since if the exceptions are optimized, those information are not used by
anyone)
 * remove dead calls to `std::terminate`

[Bug target/90028] On Intel Skylake (-march=native) generated avx512 instruction can be wrong

2019-04-09 Thread ferruh.yigit at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028

--- Comment #2 from Ferruh YIGIT  ---
While preparing the support files for this report, via --save-temps, recognized
that generated .s file output is a little different, and correct assuming the
suspicion on source of the failure was right:

3495 movl$-1, %edx
3496 salq$5, %rax
3497 xorl%ecx, %ecx
3498 kmovb   %edx, %k1
3499 .p2align 4,,10
3500 .p2align 3
3501 .L540:
3502 vmovdqu64   (%rsi,%rcx), %ymm1
3503 kmovb   %k1, %k2
3504 vpgatherqq  8(,%ymm1,1), %ymm0{%k2}
3505 kmovb   %k1, %k3
3506 vpaddq  %ymm1, %ymm0, %ymm0
3507 vpgatherqq  0(,%ymm1,1), %ymm2{%k3}
3508 vpsubq  %ymm2, %ymm0, %ymm0
3509 vmovdqu64   %ymm0, (%r8,%rcx)


It has "vpgatherqq  8 ..."

Attaching .s and .i files.


Does this mean the problem is in the assembler?

/usr/bin/as --version
GNU assembler version 2.31.1-24.fc29

[Bug target/90028] On Intel Skylake (-march=native) generated avx512 instruction can be wrong

2019-04-09 Thread ferruh.yigit at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028

--- Comment #1 from Ferruh YIGIT  ---
Created attachment 46115
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46115=edit
19.05-rc1 -mno-avx512f gcc build on skylake

The build is done with changing the lib/librte_kni/Makefile as following:

+ CFLAGS += -mno-avx512f

[Bug c/90028] New: On Intel Skylake (-march=native) generated avx512 instruction can be wrong

2019-04-09 Thread ferruh.yigit at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028

Bug ID: 90028
   Summary: On Intel Skylake (-march=native) generated avx512
instruction can be wrong
   Product: gcc
   Version: 8.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ferruh.yigit at intel dot com
  Target Milestone: ---

Created attachment 46114
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46114=edit
19.05-rc1 default gcc build on skylake

gcc version:
gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2)

binutils:
GNU ld version 2.31.1-24.fc29

This is observed in dpdk project (https://git.dpdk.org/dpdk/tree/?h=v19.05-rc1)
on Intel Skylate CPU.

Full build command (removed -I & -D ones):
gcc -Wp,-MD,./.rte_kni.o.d.tmp  -m64 -pthread -march=native -W -Wall
-Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations
-Wold-style-definition -Wpointer-arith -Wcast-align -Wnested-externs
-Wcast-qual -Wformat-nonliteral -Wformat-security -Wundef -Wwrite-strings
-Wdeprecated -Werror -Wimplicit-fallthrough=2 -Wno-format-truncation -O3
-fno-strict-aliasing -o rte_kni.o -c
/root/development/dpdk-next-net/lib/librte_kni/rte_kni.c 


When related code build with "-mno-avx512f" flag, problem solved. Also clang
(clang version 7.0.1 (Fedora 7.0.1-6.fc29)) output works fine.


Suspected from 'vpgatherqq' instruction usage.

The related .c code is
(https://git.dpdk.org/dpdk/tree/lib/librte_kni/rte_kni.c?h=v19.05-rc1#n546):

"
  static void *
  va2pa(struct rte_mbuf *m)
  {
  return (void *)((unsigned long)m -
  ((unsigned long)m->buf_addr -
   (unsigned long)m->buf_iova));
  }

  unsigned
  rte_kni_tx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned num)
  {
  void *phy_mbufs[num];
  unsigned int ret;
  unsigned int i;

  for (i = 0; i < num; i++)
  phy_mbufs[i] = va2pa(mbufs[i]);

"

'm->buf_addr' & 'm->buf_iova' are next to each other in the struct, so there is
8 bytes difference between their address.

Generated asm code:
avx512 enabled code snippet:

232c:   ba ff ff ff ff  mov$0x,%edx 
2331:   48 c1 e0 05 shl$0x5,%rax
2335:   31 c9   xor%ecx,%ecx
2337:   c5 f9 92 ca kmovb  %edx,%k1 
233b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1) 
2340:   62 f1 fe 28 6f 0c 0evmovdqu64 (%rsi,%rcx,1),%ymm1   
2347:   c5 f9 90 d1 kmovb  %k1,%k2  
*234b:   62 f2 fd 2a 91 04 0dvpgatherqq 0x1(,%ymm1,1),%ymm0{%k2}
2352:   01 00 00 00 
2356:   c5 f9 90 d9 kmovb  %k1,%k3  
235a:   c5 fd d4 c1 vpaddq %ymm1,%ymm0,%ymm0
235e:   62 f2 fd 2b 91 14 0dvpgatherqq 0x0(,%ymm1,1),%ymm2{%k3} 
2365:   00 00 00 00 
2369:   c5 fd fb c2 vpsubq %ymm2,%ymm0,%ymm0
236d:   62 d1 fe 28 7f 04 08vmovdqu64 %ymm0,(%r8,%rcx,1)


same code avx512 disabled (avx2) code snippet:

2332:   c5 ed 76 d2 vpcmpeqd %ymm2,%ymm2,%ymm2  
2336:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1) 
233d:   00 00 00
2340:   c5 fe 6f 0c 0e  vmovdqu (%rsi,%rcx,1),%ymm1 
2345:   c5 fd 6f e2 vmovdqa %ymm2,%ymm4 
2349:   c4 e2 dd 91 04 0d 08vpgatherqq %ymm4,0x8(,%ymm1,1),%ymm0
2350:   00 00 00
2353:   c5 fd 6f ea vmovdqa %ymm2,%ymm5 
2357:   c4 e2 d5 91 1c 0d 00vpgatherqq %ymm5,0x0(,%ymm1,1),%ymm3
235e:   00 00 00
2361:   c5 fd d4 c1 vpaddq %ymm1,%ymm0,%ymm0
2365:   c5 fd fb c3 vpsubq %ymm3,%ymm0,%ymm0
2369:   c4 c1 7e 7f 04 08   vmovdqu %ymm0,(%r8,%rcx,1)


full asm outputs are attached.

In the avx512 one, for 'vpgatherqq', it looks like the offset should be 0x8
instead of 0x1.

[Bug tree-optimization/65930] Reduction with sign-change not handled

2019-04-09 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65930

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #10 from Tamar Christina  ---
Hi Richard,

Do you still plan on working on this? Otherwise I'd like to add it to my list
of things to do for GCC 10.

[Bug tree-optimization/88915] Try smaller vectorisation factors in scalar fallback

2019-04-09 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88915

Tamar Christina  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #3 from Tamar Christina  ---
I'll be taking a look at this one as a part of GCC 10 as well.

[Bug tree-optimization/88259] vectorization failure for a typical loop for getting max value and index

2019-04-09 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88259

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||tnfchris at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #5 from Tamar Christina  ---
I'll be taking a look at this one as a part of GCC 10 as well.

[Bug tree-optimization/88492] SLP optimization generates ugly code

2019-04-09 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||tnfchris at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #3 from Tamar Christina  ---
I'll be taking a look at this one as a part of GCC 10 as well.

[Bug tree-optimization/86530] Vectorization failure for a simple loop

2019-04-09 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||tnfchris at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #3 from Tamar Christina  ---
I'll take this one as part of GCC10.

[Bug tree-optimization/86504] vectorization failure for a nest loop

2019-04-09 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86504

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #5 from Tamar Christina  ---
Hi Richard,

Do you still plan on working on this? Otherwise I'd like to take it over for
GCC10.

WDx提//供//税%%栗//

2019-04-09 Thread krpramfl
gcc-bugs@gcc.gnu.org
+-
《 开》 ( 
企) 《禾兑》
《具》 ( 
业) 《栗》
电:李 生,136—6075— 4190,
业 q:157— 533— 2698
---


[Bug c/90027] New: misalign variable access by piece load/store even when define STRICT_ALIGNMENT nonzero

2019-04-09 Thread zhongyunde at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90027

Bug ID: 90027
   Summary: misalign variable access by piece load/store even when
define STRICT_ALIGNMENT nonzero
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhongyunde at huawei dot com
  Target Milestone: ---

base gcc 7.3.0, in function expand_expr_real_1 we can see the follow code:  
 else if (SLOW_UNALIGNED_ACCESS (mode, align))
  temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
0, TYPE_UNSIGNED (TREE_TYPE (exp)),
(modifier == EXPAND_STACK_PARM
 ? NULL_RTX : target),
mode, mode, false);

it means even when we define STRICT_ALIGNMENT 1, it may still extract the store
insns with narrow alignment.
In Gcc internal, it say that SLOW_UNALIGNED_ACCESS should produce a nonzero
value when STRICT_ALIGNMENT is nonzero.

A test cases:
int foo ()
{
   volatile int *pswData = 2;  /* point to address 2 is not aligned with 4
bytes.  */
   return *pswData - *pswData;
}

[Bug target/89794] combine incorrectly forwards register value through auto-inc operation

2019-04-09 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89794

--- Comment #6 from Richard Earnshaw  ---
There seems to be more to this than initially thought.  Another insn is in
play.

(insn 12 10 14 2 (set (reg:SI 129)
(bswap:SI (subreg:SI (reg:DI 127 [ i ]) 4))) "/tmp/test3.c":10:7 331
{*arm_rev}
 (expr_list:REG_DEAD (reg:DI 127 [ i ])
(nil)))

Which uses the value loaded by the pre-modify instruction.

Combine manages to combine (and simplify insns 10 and 12, but the
simplification is to

(set (reg:SI 129) (const_int 0))

and we've lost the pre-inc entirely.

[Bug target/89965] [8/9 Regression] wrong code with -O -mtune=nano-x2 -fcaller-saves -fexpensive-optimizations -fno-tree-dce -fno-tree-ter

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89965

--- Comment #9 from Jakub Jelinek  ---
So, we have:
(insn 41 40 42 6 (parallel [
(set (reg/v:DI 101 [ i ])
(lshiftrt:DI (reg/v:DI 118 [ i ])
(const_int 7 [0x7])))
(clobber (reg:CC 17 flags))
]) "pr89965.c":9 574 {*lshrdi3_doubleword}
 (expr_list:REG_DEAD (reg/v:DI 118 [ i ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUIV (mem:DI (reg/f:SI 7 sp) [0  S8 A32])
(nil)
(insn 42 41 43 6 (set (reg:QI 89 [ _3 ])
(subreg:QI (reg/v:DI 101 [ i ]) 0)) "pr89965.c":10 88 {*movqi_internal}
 (nil))
(insn 43 42 44 6 (parallel [
(set (reg/v:QI 102 [ c ])
(mult:QI (reg:QI 89 [ _3 ])
(subreg:QI (reg:SI 111 [ c ]) 0)))
(clobber (reg:CC 17 flags))
]) "pr89965.c":10 351 {*mulqi3_1}
 (expr_list:REG_DEAD (reg:SI 111 [ c ])
(expr_list:REG_DEAD (reg:QI 89 [ _3 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)
(insn 44 43 45 6 (set (mem:DI (plus:SI (reg/f:SI 7 sp)
(const_int 8 [0x8])) [0  S8 A32])
(const_int 12 [0xc])) "pr89965.c":11 85 {*movdi_internal}
 (nil))
(insn 45 44 46 6 (set (mem:DI (reg/f:SI 7 sp) [0  S8 A32])
(reg/v:DI 101 [ i ])) "pr89965.c":11 85 {*movdi_internal}
 (expr_list:REG_DEAD (reg/v:DI 101 [ i ])
(nil)))
(call_insn/u 46 45 49 6 (set (reg:DI 0 ax)
(call (mem:QI (symbol_ref:SI ("__udivdi3") [flags 0x41]) [0  S1 A8])
(const_int 16 [0x10]))) "pr89965.c":11 699 {*call_value}
 (expr_list:REG_UNUSED (reg:DI 0 ax)
(expr_list:REG_EH_REGION (const_int -2147483648 [0x8000])
(nil)))
(expr_list (use (mem:DI (reg/f:SI 7 sp) [0  S8 A8]))
(expr_list (use (mem:DI (plus:SI (reg/f:SI 7 sp)
(const_int 8 [0x8])) [0  S8 A8]))
(nil
before RA (and the __udivdi3 call is actually dead - ax after it is not used.
Note the result of i >> 7 is first used in the multiplication and later stored
into the argument slot of the call.

Now, RA decides for some reason to first push the i >> 7 into the stack slot
and then load the single byte from it for the purpose of the multiplication:
(insn 41 116 118 6 (parallel [
(set (reg/v:DI 0 ax [orig:101 i ] [101])
(lshiftrt:DI (reg/v:DI 0 ax [orig:101 i ] [101])
(const_int 7 [0x7])))
(clobber (reg:CC 17 flags))
]) "pr89965.c":9 574 {*lshrdi3_doubleword}
 (expr_list:REG_EQUIV (mem:DI (reg/f:SI 7 sp) [0  S8 A32])
(nil)))
(insn 118 41 42 6 (set (mem:DI (reg/f:SI 7 sp) [0  S8 A32])
(reg/v:DI 0 ax [orig:101 i ] [101])) "pr89965.c":9 85 {*movdi_internal}
 (nil))
(insn 42 118 119 6 (set (reg:QI 6 bp [orig:89 _3 ] [89])
(mem:QI (reg/f:SI 7 sp) [0  S1 A32])) "pr89965.c":10 88
{*movqi_internal}
 (nil))
...

Finally, rtl_dce pass has code to DCE not just dead const/pure calls, but also
their arguments, but unfortunately that code (find_call_stack_args) doesn't
seem to take into account that some code might read again from those arguments
(rather than only the call reading from those stack slots).

So I guess the question is, is what the RA did above ok?  If yes, I think
find_call_stack_args needs to be changed to FOR_EACH_SUBRTX NONCONST walk the
SET_SRC (set) and if it finds a MEM that is argument slot, either punt
immediately, or remove those bits from sp_bytes, so that corresponding store
won't be set in arg_stores and we'll punt on that store.

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #50 from Jürgen Reuter  ---
(In reply to Jakub Jelinek from comment #48)
> Perhaps that redefinition of _Atomic should be guarded with
> #if (__STDC_VERSION__ < 201112L) || defined(__cplusplus)
> or so, so that for C -std=c11 you still get _Atomic?

So shall I wait for this to test the fix?

[Bug d/90012] untranslateable placeholder in expressionsem.c

2019-04-09 Thread ibuclaw at gdcproject dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90012

--- Comment #3 from Iain Buclaw  ---
(In reply to Roland Illig from comment #2)
> Thank you for changing this so quickly. Will your change make it into the
> next translation round before the 9.1 release? That would be good because it
> would save be quite some work.

Running i.e: 'msgmerge -U fr.po gcc.pot` will remove all dmd texts from fr.po,
however I don't know who maintains this, or how updates get submitted to/from
the translation project.

[Bug rtl-optimization/90026] New: [8/9 Regression] ICE: verify_flow_info failed (error: missing barrier after block 2)

2019-04-09 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90026

Bug ID: 90026
   Summary: [8/9 Regression] ICE: verify_flow_info failed (error:
missing barrier after block 2)
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: ice-checking
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

g++-9.0.0-alpha20190407 snapshot (r270192) ICEs when compiling
gcc/testsuite/g++.dg/torture/pr33340.C w/ -O2 (-O3, -Ofast)
-fnon-call-exceptions -ftracer:

% g++-9.0.0-alpha20190407 -O2 -fnon-call-exceptions -ftracer -w -c
gcc/testsuite/g++.dg/torture/pr33340.C
gcc/testsuite/g++.dg/torture/pr33340.C: In function 'void f()':
gcc/testsuite/g++.dg/torture/pr33340.C:29:1: error: missing barrier after block
2
   29 | }
  | ^
during RTL pass: outof_cfglayout
gcc/testsuite/g++.dg/torture/pr33340.C:29:1: internal compiler error:
verify_flow_info failed
0xb37297 verify_flow_info()
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190407/work/gcc-9-20190407/gcc/cfghooks.c:265
0xb509d5 execute
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190407/work/gcc-9-20190407/gcc/cfgrtl.c:3622

[Bug c++/85400] invalid Local Dynamic TLS relaxation for symbol defined in method

2019-04-09 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85400

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #10 from Marek Polacek  ---
Note that the pr85400.C test passes even if I revert the patch (on x86_64). 
Does the test really test the issue, or does it just not manifest on x86_64?

[Bug d/90012] untranslateable placeholder in expressionsem.c

2019-04-09 Thread roland.illig at gmx dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90012

--- Comment #2 from Roland Illig  ---
Thank you for changing this so quickly. Will your change make it into the next
translation round before the 9.1 release? That would be good because it would
save be quite some work.

[Bug target/89794] combine incorrectly forwards register value through auto-inc operation

2019-04-09 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89794

Richard Earnshaw  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org
Summary|wrong code with -Og |combine incorrectly
   |-fno-forward-propagate  |forwards register value
   ||through auto-inc operation

--- Comment #5 from Richard Earnshaw  ---
This appears to be combine missing a PRE_MODIFY operation.

After expand we have:

(insn 10 7 11 2 (set (reg:DI 127)
(zero_extend:DI (mem/c:HI (plus:SI (reg/f:SI 103 afp)
(const_int 8 [0x8])) [1 i+0 S2 A32]))) "/tmp/test3.c":10:7
160 {zero_extendhidi2}
 (nil))
...
(insn 24 23 25 2 (set (reg:SI 133)
(plus:SI (reg/f:SI 103 afp)
(const_int 8 [0x8]))) "/tmp/test3.c":12:3 4 {*arm_addsi3}
 (nil))
...
(insn 33 32 34 2 (set (mem/c:HI (reg:SI 133) [0 MEM[(void *)]+0 S2 A16])
(reg:HI 141)) "/tmp/test3.c":12:3 189 {*movhi_insn_arch4}
 (nil))

The auto-inc-dec pass transforms this into:

(insn 50 7 10 2 (set (reg/f:SI 133)
(reg/f:SI 103 afp)) "/tmp/test3.c":10:7 -1
 (nil))
(insn 10 50 12 2 (set (reg:DI 127 [ i ])
(zero_extend:DI (mem/c:HI (pre_modify:SI (reg/f:SI 133)
(plus:SI (reg/f:SI 133)
(const_int 8 [0x8]))) [1 i+0 S2 A32])))
"/tmp/test3.c":10:7 160 {zero_extendhidi2}
 (expr_list:REG_INC (reg/f:SI 133)
(nil)))
...
(insn 33 49 34 2 (set (mem/c:HI (reg/f:SI 133) [0 MEM[(void *)]+0 S2 A16])
(subreg:HI (reg:SI 140) 0)) "/tmp/test3.c":12:3 189 {*movhi_insn_arch4}
 (expr_list:REG_DEAD (reg:SI 140)
(expr_list:REG_DEAD (reg/f:SI 133)
(nil

And combine, missing the pre_modify, then substitutes insn 50 directly into
insn 33

Trying 50 -> 33:
   50: r133:SI=afp:SI
   33: [r133:SI]=r140:SI#0
  REG_DEAD r140:SI
  REG_DEAD r133:SI
Successfully matched this instruction:
(set (mem/c:HI (reg/f:SI 103 afp) [0 MEM[(void *)]+0 S2 A16])
(subreg:HI (reg:SI 140) 0))

Which is clearly wrong as it has now lost the pre-modify operation.

[Bug target/89965] [8/9 Regression] wrong code with -O -mtune=nano-x2 -fcaller-saves -fexpensive-optimizations -fno-tree-dce -fno-tree-ter

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89965

--- Comment #8 from Jakub Jelinek  ---
According to my bisection, this is not reproduceable on the trunk starting with
r266862.

[Bug middle-end/89972] [8/9 Regression] ICE in expand_call, at calls.c:4229

2019-04-09 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89972

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #7 from Marek Polacek  ---
I suppose we need to find out the answer first -> WAITING.

[Bug middle-end/89972] [8/9 Regression] ICE in expand_call, at calls.c:4229

2019-04-09 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89972

--- Comment #6 from Marek Polacek  ---
That'd be much appreciated, I was puzzled as to what we should do when I first
took a look at this.

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-09 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763

--- Comment #43 from Jeffrey A. Law  ---
The problem with your suggestions Segher is that we'd have to do them for every
target which defines insns with a zero_extract destination and that's been the
well understood way to handle this stuff for over 2 decades.

Improving combine avoids that problem.  Of course we have to balance the
pros/cons of any patch in that space as well which is hard to do without an
official patch to evaluate.  What I've got is just proof of concept for the
most common case, but it does show some promise.

Also note that Steve's patch just addresses combine_bfi IIUC.  My POC addresses
insv_?.c as well as the existing combine_bfi test (but I haven't tested it
against the deeper tests in Steve's patch.

[Bug middle-end/89972] [8/9 Regression] ICE in expand_call, at calls.c:4229

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89972

--- Comment #5 from Jakub Jelinek  ---
(In reply to H.J. Lu from comment #4)
> > So, do we want to ignore the TYPE_EMPTY_P arguments even for argument
> > alignment computations (both at the caller and callee)?
> 
> We should ask it in x86-64 psABI group.

Can you please do that?
Thanks.

[Bug c++/64867] split warning for passing non-POD to varargs function from -Wconditionally-supported into new warning flag, -Wnon-pod-varargs

2019-04-09 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64867

Eric Gallager  changed:

   What|Removed |Added

 Blocks||87403
Summary|warning for passing non-POD |split warning for passing
   |to varargs function |non-POD to varargs function
   ||from
   ||-Wconditionally-supported
   ||into new warning flag,
   ||-Wnon-pod-varargs

--- Comment #25 from Eric Gallager  ---
retitling


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87403
[Bug 87403] [Meta-bug] Issues that suggest a new warning

[Bug c++/90005] No error produced for the wrong type of string used in gcc >= 5.0

2019-04-09 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90005

Eric Gallager  changed:

   What|Removed |Added

   Keywords||diagnostic
 CC||egallager at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=64867

--- Comment #8 from Eric Gallager  ---
There's also a warning about passing POD thru varargs under
-Wconditionally-supported, and bug 64867 would split it off into a separate
-Wnon-pod-varargs flag

[Bug middle-end/89972] [8/9 Regression] ICE in expand_call, at calls.c:4229

2019-04-09 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89972

--- Comment #4 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #3)
> Looking at
> struct S { long a[0] __attribute__ ((aligned (32))); };
> long double u;
> void baz (struct S *);
> void bar (long double x, struct S y, long double z)
> {
>   u = x + z;
>   baz ();
> }
> this doesn't ICE, but gcc emits loads from rsp+32 and rsp+64, while clang
> from rbp+16 and rbp+32.
> So, do we want to ignore the TYPE_EMPTY_P arguments even for argument
> alignment computations (both at the caller and callee)?

We should ask it in x86-64 psABI group.

> Do we want some -Wpsabi warning for this?

I think so.

[Bug tree-optimization/90018] [8 Regression] r265453 miscompiled 527.cam4_r in SPEC CPU 2017

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90018

--- Comment #10 from Richard Biener  ---
Looking at the rev. and the context I figured the original caller was
added for a case that can no longer happen (SAME_DR_STMT set, that
can never happen since we rewrote interleaving chain detection for GCC 4.9).

[Bug rtl-optimization/90007] [9 Regression] ICE in extract_constrain_insn_cached, at recog.c:2223

2019-04-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007

--- Comment #2 from Alexander Monakov  ---
We have a pseudo:SI<-hardreg:SI assignment followed by
pseudo:DF<-float(pseudo:SI) conversion, and we substitute the latter through
the former, creating a pseudo:DF<-float(hardreg:SI) insn that fails in recog.

I'm not exactly sure why RA would reject reloading the operand when it's a
hardreg, but happily reload when it's a pseudo. Am I missing something obvious,
or are such constraints written down somewhere?

[Bug middle-end/89972] [8/9 Regression] ICE in expand_call, at calls.c:4229

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89972

--- Comment #3 from Jakub Jelinek  ---
Looking at
struct S { long a[0] __attribute__ ((aligned (32))); };
long double u;
void baz (struct S *);
void bar (long double x, struct S y, long double z)
{
  u = x + z;
  baz ();
}
this doesn't ICE, but gcc emits loads from rsp+32 and rsp+64, while clang from
rbp+16 and rbp+32.
So, do we want to ignore the TYPE_EMPTY_P arguments even for argument alignment
computations (both at the caller and callee)?
Do we want some -Wpsabi warning for this?

[Bug tree-optimization/90018] [8 Regression] r265453 miscompiled 527.cam4_r in SPEC CPU 2017

2019-04-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90018

--- Comment #9 from Martin Liška  ---
However, '--size=test' helps here, fails quickly. With the revision, there 2
files are difference: mapz_module.fppized.o.s and optics_lib.o.s.
I suspect the later one.

[Bug middle-end/89972] [8/9 Regression] ICE in expand_call, at calls.c:4229

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89972

Jakub Jelinek  changed:

   What|Removed |Added

   Keywords||ABI
 CC||andi-gcc at firstfloor dot org,
   ||hjl.tools at gmail dot com,
   ||hubicka at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Guess this is primarily an ABI issue, whether such arguments shouldn't be
passed at all even if they have the extra alignments or not.
On:
struct S { long a[0] __attribute__ ((aligned (32))); };
void bar (long double, struct S, long double);
void foo (void)
{
  struct S b;
  bar (8.0L, b, 9.0L);
}
clang doesn't agree with icc, clang passes 8.0L at rsp and 9.0 at rsp+16, while
icc at rsp and rsp+32 (and gcc ICEs).

[Bug target/89093] [9 Regression] C++ exception handling clobbers d8 VFP register

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89093

--- Comment #34 from Jakub Jelinek  ---
@@ -30877,6 +30883,11 @@ arm_valid_target_attribute_rec (tree args, struct
gcc_options *opts)
   else if (!strncmp (q, "arm", 3))
  opts->x_target_flags &= ~MASK_THUMB;

+  else if (!strncmp (q, "general-regs-only", strlen
("general-regs-only")))
+   {
+ opts->x_target_flags |= MASK_GENERAL_REGS_ONLY;
+   }
+
   else if (!strncmp (q, "fpu=", 4))
{
  int fpu_index;

I'm really not sure I understand this strncmp (but also the ones for arm and
thumb), does that mean you want to accept also general-regs-only123 or
general-regs-onlycorgewaldo or thumb__ ?  If you want to support e.g. only
optional whitespace after the string, each handled case should update the
pointer and then something after it should verify/diagnose.
Also, single stmt if bodies shouldn't be wrapped with {} and the arm case is
misindented.

Otherwise, I think the eh_personality.cc change is acceptable to me (but ask
Jonathan or other libstdc++ maintainers).

[Bug translation/90011] [9 Regression] trailing space in diagnostic

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90011

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Jakub Jelinek  ---
Fixed.

[Bug debug/90017] gcc generates wrong debug information at -O3

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90017

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-debug
 CC||aoliva at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
Hmm, I think the debuginfo is "correct", at the last invocation of
optimize_me_not () l is indeed 8.  At -O3 we "merely" unrolled the
inner loop completely.  When you step through the program at
each optimize_me_not invoation the value of l is correct but gdb
seems to set only one breakpoint for 'b 15' which you can see doing

(gdb) b optimize_me_not
(gdb) b 15
(gdb) run
Breakpoint 1, optimize_me_not () at t.c:2
2   __asm__ volatile ("" : : : "memory");
(gdb) c
Continuing.

Breakpoint 1, optimize_me_not () at t.c:2
2   __asm__ volatile ("" : : : "memory");
(gdb) c
Continuing.

Breakpoint 1, optimize_me_not () at t.c:2
2   __asm__ volatile ("" : : : "memory");
(gdb) c
Continuing.

Breakpoint 1, optimize_me_not () at t.c:2
2   __asm__ volatile ("" : : : "memory");
(gdb) c
Continuing.

Breakpoint 2, main () at t.c:15
15optimize_me_not();


looking at disassembly with source interleaved shows that this might get
wrong somewhere during debug-info creation, not sure exactly how we
compute the line number program.  Maybe it's also a consumer issue.

[Bug tree-optimization/90018] [8 Regression] r265453 miscompiled 527.cam4_r in SPEC CPU 2017

2019-04-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90018

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW

[Bug tree-optimization/90018] [8 Regression] r265453 miscompiled 527.cam4_r in SPEC CPU 2017

2019-04-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90018

--- Comment #8 from Martin Liška  ---
> 
> Please use GCC 8 branch, not trunk.  The problem only shows up on GCC 8
> branch.

I can confirm that with r265453 I see:

*** Miscompare of cam4_validate.txt; for details see
   
/home/mliska/Programming/cpu2017/benchspec/CPU/527.cam4_r/run/run_peak_refrate_gcc7-m64./cam4_validate.txt.mis
0001:   PASS:  4  points. 
Failure at Step:2   1   1   1
^
'cam4_validate.txt' long

But it's not immediately, it takes couple of minutes to see it.
I'm reducing that.

[Bug translation/90011] [9 Regression] trailing space in diagnostic

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90011

--- Comment #3 from Jakub Jelinek  ---
Author: jakub
Date: Tue Apr  9 13:19:16 2019
New Revision: 270229

URL: https://gcc.gnu.org/viewcvs?rev=270229=gcc=rev
Log:
PR translation/90011
* typeck2.c (check_narrowing): Remove trailing space from diagnostics.

Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/typeck2.c

[Bug middle-end/89998] [7/8 regression] ICE: verify_gimple failed in printf-return-value

2019-04-09 Thread gandalf at winds dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89998

--- Comment #10 from gandalf at winds dot org ---
(In reply to Jakub Jelinek from comment #9)
> Fixed for trunk.  As a workaround I'd suggest using a correct prototype or
> -fno-builtin-sprintf if you intentionally use a different one.

Thanks. Using the correct prototype (dropping the 'unsigned') indeed works as a
workaround.

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #49 from Iain Sandoe  ---
(In reply to Jakub Jelinek from comment #48)
> Perhaps that redefinition of _Atomic should be guarded with
> #if (__STDC_VERSION__ < 201112L) || defined(__cplusplus)
> or so, so that for C -std=c11 you still get _Atomic?

sure, right now the idea is to prove that the fix works (since Erik's version
was said not to, I just stopped it does to the minimum).

I have questions open with folks in Apple and clang to see if this issue is an
intentional "enhancement" or an accidental bug (and I think our eventual fix
might depend on the answer to that) - my hope is that the SDK will get reissued
so that we don't need the fix include hack at all.

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #48 from Jakub Jelinek  ---
Perhaps that redefinition of _Atomic should be guarded with
#if (__STDC_VERSION__ < 201112L) || defined(__cplusplus)
or so, so that for C -std=c11 you still get _Atomic?

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #47 from Iain Sandoe  ---
(In reply to Erik Schnetter from comment #46)
> The patch does not include the generated files. You need to run "genfixes"
> in the "fixincludes" directory after applying the patch.

the one I put above has the generated file (fixincl.x)

[Bug tree-optimization/90018] [8/9 Regression] r265453 miscompiled 527.cam4_r in SPEC CPU 2017

2019-04-09 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90018

--- Comment #7 from H.J. Lu  ---
(In reply to Martin Liška from comment #6)
> I've just tested that on -march=skylake-avx512:
> model name: Intel(R) Xeon(R) Platinum 8164 CPU @ 2.00GHz
> 
> r265451 works for me, but I had to increase a stack limit. With default
> limit 8192 I hit segfault here:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x008c890f in dyn_comp::dyn_run (ptop=219.40678, ndt=1800,
> te0=0, dyn_state=..., dyn_in=..., dyn_out=..., rc=-1) at
> dyn_comp.fppized.f90:1142
> 1142 call t_startf ('dyn_run_alloc')
> (gdb) bt
> #0  0x008c890f in dyn_comp::dyn_run (ptop=219.40678,
> ndt=1800, te0=0, dyn_state=..., dyn_in=..., dyn_out=..., rc=-1) at
> dyn_comp.fppized.f90:1142
> #1  0x009e18f9 in stepon::stepon_run1 (dtime_out=,
> phys_state=..., phys_tend=..., dyn_in=..., dyn_out=...) at
> stepon.fppized.f90:427
> #2  0x00a27840 in cam_comp::cam_run1 (cam_in=..., cam_out=...) at
> cam_comp.fppized.f90:195
> #3  0x00a50a04 in atm_comp_mct::atm_init_mct (eclock=...,
> cdata_a=..., x2a_a=..., a2x_a=..., nlfilename= requires 4200145 bytes, which is more than max-value-size>,
> _nlfilename=_nlfilename@entry=0) at atm_comp_mct.fppized.f90:349
> #4  0x00ab0f02 in ccsm_comp_mod::ccsm_init () at
> ccsm_comp_mod.fppized.f90:1577
> #5  0x00402bcd in ccsm_driver () at ccsm_driver.fppized.f90:57
> #6  main (argc=, argv=) at
> ccsm_driver.fppized.f90:25
> #7  0x76e35ea7 in __libc_start_main () from /lib64/libc.so.6
> 
> @H.J. Can you please check the stack limit?

Please use GCC 8 branch, not trunk.  The problem only shows up on GCC 8 branch.

[Bug c++/90010] [8/9 Regression] valgrind error with snprintf and -Wall

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90010

Jakub Jelinek  changed:

   What|Removed |Added

   Keywords|ice-on-valid-code   |diagnostic

--- Comment #4 from Jakub Jelinek  ---
Not an ICE actually, just printing random bytes (1/2/3 at most) at the end of
the %qs string instead of the bytes that should be there.

[Bug c++/90010] [8/9 Regression] valgrind error with snprintf and -Wall

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90010

--- Comment #3 from Jakub Jelinek  ---
Created attachment 46113
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46113=edit
gcc9-pr90010.patch

Untested fix.

[Bug target/90024] [7/8 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
  Known to work||9.0
   Target Milestone|--- |7.5
Summary|[7/8/9 Regression] ICE on   |[7/8 Regression] ICE on
   |AArch32 NEON mov with   |AArch32 NEON mov with
   |TImode constant.|TImode constant.

[Bug tree-optimization/90020] [7/8/9 regression] -O2 -Os x86-64 wrong code generated for GNU Emacs

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020

--- Comment #11 from Richard Biener  ---
For the RTL issue there's

compute_hash_table_work (struct gcse_hash_table_d *table)
{
...
  /* First pass over the instructions records information used to
 determine when registers and memory are first and last set.  */
  FOR_BB_INSNS (current_bb, insn)
{
  if (!NONDEBUG_INSN_P (insn))
continue;

  if (CALL_P (insn))
{
  hard_reg_set_iterator hrsi;
  EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call,
  0, regno, hrsi)
record_last_reg_set_info (insn, regno);

  if (! RTL_CONST_OR_PURE_CALL_P (insn))
record_last_mem_set_info (insn);

which eventually initializes blocks_with_calls which prunes transp.  But
the calls in question are marked PURE but also
RTL_LOOPING_CONST_OR_PURE_CALL_P.
So the obvious thing for the above is to still mark the block for
RTL_LOOPING_CONST_OR_PURE_CALL_P.

Testing overall patch.

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread schnetter at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #46 from Erik Schnetter  ---
The patch does not include the generated files. You need to run "genfixes" in
the "fixincludes" directory after applying the patch.

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #45 from Iain Sandoe  ---
(In reply to Jürgen Reuter from comment #44)
> (In reply to Iain Sandoe from comment #43)
> > Created attachment 46110 [details]
> > Proof-of-principle path
> > 
> > Does this work for you?
> >  - my local testing says it generates the right wrapped include file.
> > 
> > (perhaps the constraint on darwin version was too tight in Erik's case)
> 
> Sorry for my ignorance, but how do I apply this? Do I just patch, and then
> configure and compile/bootstrap as a normal svn checkout, or do I have to do
> anything special regarding the fixincludes? And if so, is there a link to
> some documentation?

the patch includes the generated files, so yes ...

patch -p1 < .. 

and then configure and make

[Bug c++/90010] [8/9 Regression] valgrind error with snprintf and -Wall

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90010

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||jakub at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

[Bug tree-optimization/90020] [7/8/9 regression] -O2 -Os x86-64 wrong code generated for GNU Emacs

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020

--- Comment #10 from Richard Biener  ---
(In reply to Martin Liška from comment #6)
> I bisected GCC 4.9.x branch and it started with r215059, which is a backport
> of 3 patches. I reverted changes in:
> patching file gcc/recog.c
> patching file gcc/tree-ssa-loop-niter.c
> patching file gcc/tree-vect-slp.c
> 
> and so that it points to backport of PR61672.

Note that was a fix for the fallout of r208113 so before that rev. the issue
should "re-appear" in the past.

[Bug tree-optimization/90020] [7/8/9 regression] -O2 -Os x86-64 wrong code generated for GNU Emacs

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020

--- Comment #9 from Richard Biener  ---
/* { dg-do run } */
/* { dg-require-weak "" } */

void __attribute__((noinline,noclone))
check (int i)
{
  if (i == 0)
__builtin_exit (0);
}

int i;
extern int x __attribute__((weak));

int main(int argc, char **argv)
{
  if (argc)
{
  check (i);
  return x;
}
  else
{
  check (i);
  return x-1;
}
  return 0;
}


FAILs at -O2 due to GIMPLE PRE and at -Os due to RTL hoist (if GIMPLE PRE
is fixed).

[Bug target/90024] [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-09 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

--- Comment #2 from Matthew Malcomson  ---
Author: matmal01
Date: Tue Apr  9 11:39:59 2019
New Revision: 270226

URL: https://gcc.gnu.org/viewcvs?rev=270226=gcc=rev
Log:
Hi there,

The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn"
constraint to match vmov.f32 and vmov.i patterns.
This constraint boils down to using the `neon_immediate_valid` function.
Once the constraint has matched, the output C statement asserts that function
passes.

The output C statement calls `neon_immediate_valid` with the mode taken from
the
iterator, while the constraint takes the mode from the operand.
This can cause a discrepency when the operand is a CONST_INT, as the constraint
passes VOIDmode which `neon_immediate_valid` treats as DImode, while the C
statement passes the mode of the iterator which can be TImode.
When this happens, the `neon_immediate_valid` can fail in the second call (if
e.g. the CONST_INT is a valid immediate in DImode but not TImode) which would
trigger the assertion.

The testcase added with this patch triggers this when compiled with an arm
cross
compiler using the command line below.
gcc -march=armv8-a -c neon-immediate-timode.c -O1 -mfloat-abi=hard
-mfpu=neon-fp-armv8

This patch splits the original "Dn" constraint into three new constraints, "DN"
for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for CONST_VECTOR.
Splitting things up this way requires using one extra alternative in the
"*neon_mov" patterns, but makes it clear from the constraint what mode is
being used.

We also remove the behaviour of treating VOIDmode as DImode in
`neon_valid_immediate` since the original "Dn" constraint was the only place
that functionality was used.  VOIDmode is now never passed to that function.
An assertion has been added to the function to ensure this problem is caught
earlier on.

Bootstrapped on arm-none-linux-gnueabihf
Regtested on cross-compiler arm-none-eabi

gcc/ChangeLog:

2019-04-09  Matthew Malcomson  

PR target/90024
* config/arm/arm.c (neon_valid_immediate): Disallow VOIDmode parameter.
* config/arm/constraints.md (Dm, DN, Dn): Split previous Dn constraint
into three.
* config/arm/neon.md (*neon_mov): Account for TImode and DImode
differences directly.
(*smax3_neon, vashl3, vashr3_imm): Use Dm constraint.

gcc/testsuite/ChangeLog:

2019-04-09  Matthew Malcomson  

PR target/90024
* gcc.dg/torture/neon-immediate-timode.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/torture/neon-immediate-timode.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm.c
trunk/gcc/config/arm/constraints.md
trunk/gcc/config/arm/neon.md
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/89998] [7/8 regression] ICE: verify_gimple failed in printf-return-value

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89998

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[7/8/9 regression] ICE: |[7/8 regression] ICE:
   |verify_gimple failed in |verify_gimple failed in
   |printf-return-value |printf-return-value

--- Comment #9 from Jakub Jelinek  ---
Fixed for trunk.  As a workaround I'd suggest using a correct prototype or
-fno-builtin-sprintf if you intentionally use a different one.

[Bug bootstrap/89864] [9 regression] gcc fails to build/bootstrap with XCode 10.2

2019-04-09 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89864

--- Comment #44 from Jürgen Reuter  ---
(In reply to Iain Sandoe from comment #43)
> Created attachment 46110 [details]
> Proof-of-principle path
> 
> Does this work for you?
>  - my local testing says it generates the right wrapped include file.
> 
> (perhaps the constraint on darwin version was too tight in Erik's case)

Sorry for my ignorance, but how do I apply this? Do I just patch, and then
configure and compile/bootstrap as a normal svn checkout, or do I have to do
anything special regarding the fixincludes? And if so, is there a link to some
documentation?

[Bug tree-optimization/90020] [7/8/9 regression] -O2 -Os x86-64 wrong code generated for GNU Emacs

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #8 from Richard Biener  ---
I have a patch for GIMPLE and will produce a nicer testcase for that.  Will
also look at the RTL hoisting issue.

[Bug tree-optimization/90020] [7/8/9 regression] -O2 -Os x86-64 wrong code generated for GNU Emacs

2019-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90020

--- Comment #7 from Richard Biener  ---
So looking at one issue I can see is code-hoisting hoisting
MEM[(struct window *)window_6(D) + -5B].contents across a call that might
not return.  This can only happen for calls we can alias-disambiguate
against which means in this case pure calls.  For example for divisions
we guard against this case my checking whether it may trap and there
was an earlier call that might not return.  That is missing for memory
referneces.

   [local count: 1073741824]:
  # VUSE <.MEM_5(D)>
  _1 = WINDOWP (window_6(D));
  if (_1 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:
  # VUSE <.MEM_5(D)>
  _2 = MEM[(struct window *)window_6(D) + -5B].contents;
  # VUSE <.MEM_5(D)>
  _3 = BUFFERP (_2);
  _8 = (int) _3;

   [local count: 1073741824]:
  # iftmp.1_4 = PHI <_8(3), 0(2)>
  # VUSE <.MEM_5(D)>
  CHECK_TYPE (iftmp.1_4, 4856B, window_6(D));
  # VUSE <.MEM_5(D)>
  _7 = MEM[(struct window *)window_6(D) + -5B].contents;
  # VUSE <.MEM_5(D)>
  return _7;

But fixing that on the GIMPLE level doesn't make the issue go away since
we have similar functionality on RTL which triggers (and is the older
issue since GIMPLE can do hoisting only since GCC 7).

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-09 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #18 from rsandifo at gcc dot gnu.org  
---
(In reply to kugan from comment #12)
> (In reply to rsand...@gcc.gnu.org from comment #10)
> > (In reply to kugan from comment #9)
> > > Created attachment 46040 [details]
> > > patch
> > 
> > Wasn't sure whether this patch was WIP or the final version
> > for review, but we need to do something more generic than
> > dividing by 4.  I think the test will still fail with "int"
> > changed to "short" for example.
> > 
> > I also don't think the new candidate should be tied to the
> > mask/load store functions.  Maybe one approach would be to
> > check when adding a zero-based candidate for a use in:
> > 
> >   /* Record common candidate with initial value zero.  */
> >   basetype = TREE_TYPE (iv->base);
> >   if (POINTER_TYPE_P (basetype))
> > basetype = sizetype;
> >   record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> > 
> > whether the use actually benefits from this unscaled iv.
> > If the use is USE_REF_ADDRESS, we could compare the cost
> > of an address with an unscaled index with the cost of an address
> > with a scaled index.  I think the natural scale value to try
> > would be GET_MODE_INNER (TYPE_MODE (mem_type)).
> 
> Thanks for the comments. I agree this is the right place. But I am not sure
> if checking the cost at this point is what IV opt generally does. In
> general, IV-opt adds candidates which can be helpful and later decides the
> optimal set. 

But I was talking about comparing the cost of the address rather
than the cost of the iv.  Like you say, the idea is to add candidates
that might be useful, and what we want to know here is whether the
bytes offset is likely to be a useful candidate for this use.

Another way of deciding whether to go for a scaled candidate would
be to test for a legitimate address directly (rather than via
address costs) if you prefer that.  I just thought using address
costs might be easier.

We could also keep the unscaled candidate in addition to the
new scaled one if we have evidence that having both is useful.
The danger is that if we add too many, we'll trip the iv limit,
so I think we'd need positive evidence for keeping both.

> If we are to use get_computation_cost to see the costs, we have to create
> iv_cand and then discard. Since we are adding only one candidate and that
> too for SVE like targets, I am thinking that it is OK. If you still prefer
> to check the cost, I will change that.

IMO it's a generic concept that just happens to apply to SVE.
If an architecture is going to support just one "reg+reg" addressing
mode, the two obvious choices are for the offset register to be unscaled
(bytes) or scaled by the element or access size (indices).  SVE chose
the latter.  In that case, the most useful candidate is likely to be
the index rather than the byte offset.

This applies to single-vector loads and stores as well as
LOAD/STORE_LANES.  The reason we usually get good iv choices
for single vectors is that the index usually exists as a candidate
already, in the form of the loop control iv.  (This is of course the
main benefit to base+scaled addressing over base+unscaled addressing.)
But it's probably possible to construct examples in which the
index candidate doesn't already exist even for single vectors.

> Attached patch (only the ivopt changes) and testcase

[Bug middle-end/90025] [9 Regression] botan2 miscompilation on s390x-linux since r268957

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90025

--- Comment #1 from Jakub Jelinek  ---
Created attachment 46112
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46112=edit
gcc9-pr90025.patch

Untested fix.

[Bug target/88809] do not use rep-scasb for inline strlen/memchr

2019-04-09 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88809

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #4 from Peter Cordes  ---
Yes, rep scasb is abysmal, and gcc -O3's 4-byte-at-a-time scalar loop is not
very good either.

With 16-byte alignment, (which we have from calloc on x86-64 System V), we can
inline a *much* better SSE2 loop.  See
https://stackoverflow.com/a/55589634/224132 for more details and
microbenchmarks; 

On Skylake it's about 4 to 5x faster than the current 4-byte loop for large
strings, 3x faster for short strings.  For short strings (strlen=33), it's
about 1.5x faster than calling strlen.  For very large strings (too big for L2
cache), it's ~1.7x slower than glibc's AVX2 strlen.

The lack of VEX encoding for pxor and pmovmskb is just me being lazy; let gcc
emit them all with VEX if AVX is enabled.

   # at this point gcc has `s` in RDX, `i` in ECX

pxor   %xmm0, %xmm0 # zeroed vector to compare against
.p2align 4
.Lstrlen16: # do {
#ifdef __AVX__
vpcmpeqb   (%rdx), %xmm0, %xmm1
#else
movdqa (%rdx), %xmm1
pcmpeqb%xmm0, %xmm1   # xmm1 = -1 where there was a 0 in memory
#endif

add $16, %rdx # ptr++
pmovmskb  %xmm1, %eax # extract high bit of each byte to a
16-bit mask
test   %eax, %eax
jz.Lstrlen16# }while(mask==0);
# RDX points at the 16-byte chunk *after* the one containing the terminator
# EAX = bit-mask of the 0 bytes, and is known to be non-zero
bsf%eax, %eax   # EAX = bit-index of the lowest set bit

# terminator is at rdx+rax - 16
#  movb   $'A', -16(%rdx, %rax)  // for a microbench that used
s[strlen(s)]='A'
sub%rbp, %rdx   # p -= start
lea   -16(%rdx, %rax)   # p += byte_within_vector - 16

We should actually use  REP BSF  because that's faster on AMD (tzcnt), and same
speed on Intel.


Also an inline-asm implementation of it with a microbenchmark adapted from the
SO question.  (Compile with -DUSE_ASM -DREAD_ONLY to benchmark a fixed length
repeatedly)
https://godbolt.org/z/9tuVE5

It uses clock() for timing, which I didn't bother updating.  I made it possible
to run it for lots of iterations for consistent timing.  (And so the real work
portion dominates the runtime so we can use perf stat to measure it.)




If we only have 4-byte alignment, maybe check the first 4B, then do (p+4) & ~7
to either overlap that 4B again or not when we start 8B chunks.  But probably
it's good to get to 16-byte alignment and do whole SSE2 vectors, because
repeating an aligned 16-byte test that overlaps an 8-byte test costs the same
as doing another 8-byte test.  (Except on CPUs like Bobcat that split 128-bit
vectors into 64-bit halves).  The extra AND to round down to an alignment
boundary is all it takes, plus the code-size cost of peeling 1 iteration each
of 4B and 8B before a 16-byte loop.

We can use 4B / 8B with movd / movq instead of movdqa.  For pmovmskb, we can
ignore the compare-true results for the upper 8 bytes by testing the result
with `test %al,%al`, or in general with `test $0x0F, %al` to check only the low
4 bits of EAX for the 4-byte case.



The scalar bithack version can use BSF instead of CMOV binary search for the
byte with a set high bit.  That should be a win if we ever wanted to do scalar
on some x86 target especially with 8-byte registers, or on AArch64.  AArch64
can rbit / clz to emulate bsf and find the position of the first set bit.

(Without efficient SIMD compare result -> integer_mask, or efficient SIMD ->
integer at all on some ARM / AArch64 chips, SIMD compares for search loops
aren't always (ever?) a win.  IIRC, glibc strlen and memchr don't use vectors
on ARM / AArch64, just scalar bithacks.)

[Bug tree-optimization/90018] [8/9 Regression] r265453 miscompiled 527.cam4_r in SPEC CPU 2017

2019-04-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90018

Martin Liška  changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING

--- Comment #6 from Martin Liška  ---
I've just tested that on -march=skylake-avx512:
model name  : Intel(R) Xeon(R) Platinum 8164 CPU @ 2.00GHz

r265451 works for me, but I had to increase a stack limit. With default limit
8192 I hit segfault here:

Program received signal SIGSEGV, Segmentation fault.
0x008c890f in dyn_comp::dyn_run (ptop=219.40678, ndt=1800,
te0=0, dyn_state=..., dyn_in=..., dyn_out=..., rc=-1) at
dyn_comp.fppized.f90:1142
1142   call t_startf ('dyn_run_alloc')
(gdb) bt
#0  0x008c890f in dyn_comp::dyn_run (ptop=219.40678, ndt=1800,
te0=0, dyn_state=..., dyn_in=..., dyn_out=..., rc=-1) at
dyn_comp.fppized.f90:1142
#1  0x009e18f9 in stepon::stepon_run1 (dtime_out=,
phys_state=..., phys_tend=..., dyn_in=..., dyn_out=...) at
stepon.fppized.f90:427
#2  0x00a27840 in cam_comp::cam_run1 (cam_in=..., cam_out=...) at
cam_comp.fppized.f90:195
#3  0x00a50a04 in atm_comp_mct::atm_init_mct (eclock=..., cdata_a=...,
x2a_a=..., a2x_a=..., nlfilename=,
_nlfilename=_nlfilename@entry=0) at atm_comp_mct.fppized.f90:349
#4  0x00ab0f02 in ccsm_comp_mod::ccsm_init () at
ccsm_comp_mod.fppized.f90:1577
#5  0x00402bcd in ccsm_driver () at ccsm_driver.fppized.f90:57
#6  main (argc=, argv=) at
ccsm_driver.fppized.f90:25
#7  0x76e35ea7 in __libc_start_main () from /lib64/libc.so.6

@H.J. Can you please check the stack limit?

[Bug middle-end/90025] [9 Regression] botan2 miscompilation on s390x-linux since r268957

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90025

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-04-09
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
   Target Milestone|--- |9.0
 Ever confirmed|0   |1

[Bug middle-end/90025] New: [9 Regression] botan2 miscompilation on s390x-linux since r268957

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90025

Bug ID: 90025
   Summary: [9 Regression] botan2 miscompilation on s390x-linux
since r268957
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

The following testcase is miscompiled e.g. with -O2 -march=zEC12 -mtune=z13 on
s390x-linux:

__attribute__((noipa)) void
bar (char *p)
{
  int i;
  for (i = 0; i < 6; i++)
if (p[i] != "foobar"[i])
  __builtin_abort ();
  for (; i < 32; i++)
if (p[i] != '\0')
  __builtin_abort ();
}

__attribute__((noipa)) void
foo (unsigned int x)
{
  char s[32] = { 'f', 'o', 'o', 'b', 'a', 'r', 0 };
  ((unsigned int *) s)[2] = __builtin_bswap32 (x);
  bar (s);
}

int
main ()
{
  foo (0);
  return 0;
}

The problem is that since that change we emit a store_by_pieces (8 bytes)
followed by clear_storage (24 bytes), but the object we pass to the latter
actually has S1, so DSE2 then happily removes it when it sees a further store
of 4 bytes to s+8.

[Bug target/90024] [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-09 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

Matthew Malcomson  changed:

   What|Removed |Added

 Target||arm
  Known to work||4.9.0

--- Comment #1 from Matthew Malcomson  ---
The "*neon_mov" patterns for 128 bit sized quantities uses the "Dn"
constraint to match vmov.f32 and vmov.i patterns.

This constraint boils down to using the `neon_immediate_valid` function.
Once the constraint has matched, the output C statement asserts the same
function
passes.

The output C statement calls `neon_immediate_valid` with the mode taken from
the
iterator, while the constraint takes the mode from the operand.


In the above testcase the operand is a CONST_INT, which means the constraint
passes VOIDmode (treated the same as DImode in `neon_immediate_valid`), while
the C statement passes TImode (the mode of the iterator).

This causes second call to `neon_immediate_valid` to fail as the value provided
is only valid in DImode but not TImode, and that causes the ICE.


The attached patch splits the original "Dn" constraint into three new
constraints, "DN" for TImode CONST_INT, "Dn" for DImode CONST_INT, and "Dm" for
CONST_VECTOR.
This requires one extra alternative in the "*neon_mov" patterns, but
makes it clear from the constraint what mode is being used.

We use the "DN" constraint for the define_insn that matches TImode values, and
hence avoid the above problem.

[Bug target/90024] New: [7/8/9 Regression] ICE on AArch32 NEON mov with TImode constant.

2019-04-09 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90024

Bug ID: 90024
   Summary: [7/8/9 Regression] ICE on AArch32 NEON mov with TImode
constant.
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code, patch
  Severity: normal
  Priority: P3
 Component: target
  Assignee: matmal01 at gcc dot gnu.org
  Reporter: matmal01 at gcc dot gnu.org
  Target Milestone: ---

Created attachment 46111
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46111=edit
Proposed fix

The below code causes an ICE for AArch32 targets with NEON at all optimisation
levels except -O0.


union a { 
  char b; 
  long long c; 
}; 
union a d; 
int g(int, union a, union a); 
void e() { 
  union a f[2] = {-1L}; 
  g(0, d, f[0]); 
} 


With the backtrace below.

$ arm-none-eabi-gcc -march=armv8-a -c test.c -O1 -mfloat-abi=hard
-mfpu=neon-fp-armv8
during RTL pass: final
test.c: In function 'e':
test.c:10:1: internal compiler error: in output_950, at config/arm/neon.md:89
   10 | }
  | ^
0x1352bfb output_950
   
/tmp/dgboter/bbs/rhev-vm4--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/config/arm/neon.md:89
0x8aafbd get_insn_template(int, rtx_insn*)
   
/tmp/dgboter/bbs/rhev-vm4--rhe6x86_64/buildbot/rhe6x86_64--arm-none-eabi/build/src/gcc/gcc/final.c:2071
  


I have a patch to fix the problem, creating a bugzilla report for tracking
purposes (patch added as attachment, the explanation will be added in
comments).

[Bug target/89794] wrong code with -Og -fno-forward-propagate

2019-04-09 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89794

--- Comment #4 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #3)
> Guess with PR89475 fix this will be latent, unless one disables ccp.
> Anyway, to me this looks like a backend bug.  The function is leaf, but for
> some strange reason LRA uses the lr register and so lr needs to be pushed
> and poped, but that push/pop doesn't seem to be accounted for in the afp to
> sp elimination offset computation.

I'm still seeing it in a build from 2019/04/04, so not latent.

Current suspect is the code in arm_compute_elimination_offset (in arm.c), where
we eliminate from the arg pointer to the stack pointer.  The comment says that
if there has been nothing pushed on the stack at all, then the offset result
should be '-4' (and asserts strongly in the comments that this is the correct
result) --- I don't understand why that should be the case.  However, that code
is essentially 18 years old, so I'm not going to try messing with it until I
understand it better.

[Bug translation/90011] [9 Regression] trailing space in diagnostic

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90011

--- Comment #2 from Jakub Jelinek  ---
Author: jakub
Date: Tue Apr  9 10:27:14 2019
New Revision: 270225

URL: https://gcc.gnu.org/viewcvs?rev=270225=gcc=rev
Log:
PR translation/90011
* ipa-devirt.c (compare_virtual_tables): Remove two trailing spaces
from diagnostics.
* config/arm/freebsd.h (LINK_SPEC): Remove trailing space from -p
diagnostics.
* config/riscv/freebsd.h (LINK_SPEC): Likewise.
* config/aarch64/aarch64-freebsd.h (FBSD_TARGET_LINK_SPEC): Likewise.
* config/darwin.h (DRIVER_SELF_SPECS, ASM_FINAL_SPEC): Remove
trailing space from -gsplit-dwarf diagnostics.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64-freebsd.h
trunk/gcc/config/arm/freebsd.h
trunk/gcc/config/darwin.h
trunk/gcc/config/riscv/freebsd.h
trunk/gcc/ipa-devirt.c

[Bug middle-end/89998] [7/8/9 regression] ICE: verify_gimple failed in printf-return-value

2019-04-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89998

--- Comment #8 from Jakub Jelinek  ---
Author: jakub
Date: Tue Apr  9 10:26:13 2019
New Revision: 270224

URL: https://gcc.gnu.org/viewcvs?rev=270224=gcc=rev
Log:
PR tree-optimization/89998
* gimple-ssa-sprintf.c (try_substitute_return_value): Use lhs type
instead of integer_type_node if possible, don't add ranges if return
type is not compatible with int.
* gimple-fold.c (gimple_fold_builtin_sprintf,
gimple_fold_builtin_snprintf): Use lhs type instead of hardcoded
integer_type_node.

* gcc.c-torture/compile/pr89998-1.c: New test.
* gcc.c-torture/compile/pr89998-2.c: New test.

Added:
trunk/gcc/testsuite/gcc.c-torture/compile/pr89998-1.c
trunk/gcc/testsuite/gcc.c-torture/compile/pr89998-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-fold.c
trunk/gcc/gimple-ssa-sprintf.c
trunk/gcc/testsuite/ChangeLog

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-09 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #17 from kugan at gcc dot gnu.org ---
(In reply to Wilco from comment #16)
> (In reply to kugan from comment #15)
> > (In reply to Wilco from comment #11)
> > > There is also something odd with the way the loop iterates, this doesn't
> > > look right:
> > > 
> > > whilelo p0.s, x3, x4
> > > incwx3
> > > ptest   p1, p0.b
> > > bne .L3
> > 
> > I am not sure I understand this. I tried with qemu using an execution
> > testcase and It seems to work.
> > 
> > whilelo p0.s, x4, x5
> > incwx4
> > ptest   p1, p0.b
> > bne .L3
> > In my case I have the above (register allocation difference only) incw is
> > correct considering two vector word registers? Am I missing something here?
> 
> I'm talking about the completely redundant ptest, where does that come from?

It is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88836

  1   2   >