from:"krister.walfridsson at gmail dot com"

[Bug c/94392] New: Infinite loops are optimized away for C99

2020-03-29 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94392

Bug ID: 94392
   Summary: Infinite loops are optimized away for C99
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

Created attachment 48141
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48141=edit
Source code reproducing the issue

John Regehr noticed on twitter
(https://twitter.com/johnregehr/status/1244335355509129216) that trunk GCC
removes infinite loops for C99, as can be seen by
  gcc -O3 -std=c99 fermat.c

This behavior was introduced by the introduction of -ffinite-loops being
enabled at -O2. This is fine for C11, but infinite loops do not invoke
undefined behavior in C99, so the optimization should not be enabled per
default for -std=c99.

[Bug tree-optimization/81388] Incorrect code generation with -O1 -fno-strict-overflow

2017-07-12 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388

krister.walfridsson at gmail dot com changed:

   What|Removed |Added

 CC||krister.walfridsson at gmail 
dot c
   ||om

--- Comment #5 from krister.walfridsson at gmail dot com ---
Comment #1 says that "-fno-strict-overflow Does not effect Pointers", but the
manual says for -fstrict-overflow:

"This option also allows the compiler to assume strict pointer semantics: given
a pointer to an object, if adding an offset to that pointer does not produce a
pointer to the same object, the addition is undefined. This permits the
compiler to conclude that p + u > p is always true for a pointer p and unsigned
integer u.  This assumption is only valid because pointer wraparound is
undefined, as the expression is false if p + u overflows using twos complement
arithmetic."

At least I read this as -fno-strict-overflow permit pointer overflow too. Is
that incorrect? If so, then I think the manual should be corrected/clarified...

[Bug c/80852] Optimisation fails to recognise sum computed by loop

2017-05-23 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80852

krister.walfridsson at gmail dot com changed:

   What|Removed |Added

 CC||krister.walfridsson at gmail 
dot c
   ||om

--- Comment #3 from krister.walfridsson at gmail dot com ---
This is related to (or a dup of) tree-optimization/46186

[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO

2017-05-03 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600

krister.walfridsson at gmail dot com changed:

   What|Removed |Added

 CC||krister.walfridsson at gmail 
dot c
   ||om

--- Comment #7 from krister.walfridsson at gmail dot com ---
Yes, it works with GCC 6, and it used to work with GCC 7.  My guess is that it
started to fail with r243219.

I'm at a conference the rest of this week, but I'll fix this as soon as I'm
back.

[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion

2017-05-03 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520

--- Comment #5 from krister.walfridsson at gmail dot com ---
I have extracted a smaller test case.  The loops are generated from

  typedef mersenne_twister_engine<
  uint_fast32_t,
  32, 624, 397, 31,
  0x9908b0dfUL, 11,
  0xUL, 7,
  0x9d2c5680UL, 15,
  0xefc6UL, 18, 1812433253UL> mt19937;

and the expansion of the template end up with loops like

void foo(unsigned long *M)
{
  for (unsigned long k = 0; k < 227; ++k)
{
  unsigned long y =
((M[k] & 0x8000) | (M[k + 1] & 0x7fff));
  M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0));
}
}

which generates the dump described in the bug report.

--- Comment #6 from krister.walfridsson at gmail dot com ---
I have extracted a smaller test case.  The loops are generated from

  typedef mersenne_twister_engine<
  uint_fast32_t,
  32, 624, 397, 31,
  0x9908b0dfUL, 11,
  0xUL, 7,
  0x9d2c5680UL, 15,
  0xefc6UL, 18, 1812433253UL> mt19937;

and the expansion of the template end up with loops like

void foo(unsigned long *M)
{
  for (unsigned long k = 0; k < 227; ++k)
{
  unsigned long y =
((M[k] & 0x8000) | (M[k + 1] & 0x7fff));
  M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0));
}
}

which generates the dump described in the bug report.

[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion

2017-05-03 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520

--- Comment #5 from krister.walfridsson at gmail dot com ---
I have extracted a smaller test case.  The loops are generated from

  typedef mersenne_twister_engine<
  uint_fast32_t,
  32, 624, 397, 31,
  0x9908b0dfUL, 11,
  0xUL, 7,
  0x9d2c5680UL, 15,
  0xefc6UL, 18, 1812433253UL> mt19937;

and the expansion of the template end up with loops like

void foo(unsigned long *M)
{
  for (unsigned long k = 0; k < 227; ++k)
{
  unsigned long y =
((M[k] & 0x8000) | (M[k + 1] & 0x7fff));
  M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0));
}
}

which generates the dump described in the bug report.

--- Comment #6 from krister.walfridsson at gmail dot com ---
I have extracted a smaller test case.  The loops are generated from

  typedef mersenne_twister_engine<
  uint_fast32_t,
  32, 624, 397, 31,
  0x9908b0dfUL, 11,
  0xUL, 7,
  0x9d2c5680UL, 15,
  0xefc6UL, 18, 1812433253UL> mt19937;

and the expansion of the template end up with loops like

void foo(unsigned long *M)
{
  for (unsigned long k = 0; k < 227; ++k)
{
  unsigned long y =
((M[k] & 0x8000) | (M[k + 1] & 0x7fff));
  M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0));
}
}

which generates the dump described in the bug report.

[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion

2017-04-26 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520

--- Comment #3 from krister.walfridsson at gmail dot com ---
You can see the issue in the generated code with

  int foo(std::mt19937 )
  {
std::uniform_int_distribution dist(0,99);
return dist(gen);
  }

too. I.e. it is not just an artifact of the uninteresting use in the
benchmarking loop.

[Bug tree-optimization/80520] New: Performance regression from missing if-conversion

2017-04-25 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520

Bug ID: 80520
   Summary: Performance regression from missing if-conversion
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---
Target: x86_64-linux-gnu

Created attachment 41266
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41266=edit
Test case demonstrating the problem

The following test case from a CppCon 2016 talk benchmarking different
randomization constructs

  #include 

  void foo(std::mt19937 )
  {
for (int i = 0; i < 10; ++i)
{
  std::uniform_int_distribution dist(0,99);
  volatile auto x = dist(gen);
}
  }

runs much slower when compiled with gcc 8.0 (r247084) compared to gcc 6.3
  gcc 6.3.0: 3.9s
  gcc 8.0.0: 7.7s
(compiled as "g++ -O3" on x86_64-linux-gnu).

The benchmark is silly, but it indicates that the heuristics for the branch
optimizations could be improved

The difference is that the .optimized dump generated by gcc 6.3.0 contains code
segments of the form

  _32 = __y_27 & 1;
  iftmp.1_33 = _32 != 0 ? 2567483615 : 0;
  _34 = _31 ^ iftmp.1_33;
  MEM[base: _97, offset: 0B] = _34;
  ivtmp.35_100 = ivtmp.35_101 + 8;
  if (_94 == ivtmp.35_100)
goto ;
  else
goto ;

where iftmp.1_33 is generated as a cmov, while the same code compiled by gcc
8.0.0 looks like

  _102 = __y_60 & 1;
  if (_102 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [49.50%]:
  _98 = _103 ^ 2567483615;
  MEM[base: _105, offset: 0B] = _98;
  ivtmp.33_91 = ivtmp.33_43 + 8;
  if (_47 == ivtmp.33_91)
goto ; [1.01%]
  else
goto ; [98.99%]

   [49.50%]:
  MEM[base: _105, offset: 0B] = _103;
  ivtmp.33_44 = ivtmp.33_43 + 8;
  if (ivtmp.33_44 == _47)
goto ; [1.01%]
  else
goto ; [98.99%]

and the CPU mispredicts the branch generated from "if (_102 != 0)".

[Bug tree-optimization/79721] New: Scalar evolution introduces signed overflow

2017-02-26 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79721

Bug ID: 79721
   Summary: Scalar evolution introduces signed overflow
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

The function

  int foo(int a, int b)
  {
int sum = 0;
for (int i = 0; i < 6; i++)
  {
sum += a + i * b;
  }
return sum;
  }

is transformed to

  int _11;
  int _12;
  int _13;
  int _16;

  [...]

  _16 = b_7(D) + a_8(D);
  _13 = _16 * 5;
  _12 = b_7(D) * 1799910001;
  _11 = _12 + _13;
  sum_17 = _11 + a_8(D);
  return sum_17;

by scalar evolution when compiled as "gcc -O3 -c bug.c".  The original function
could calculate foo(-3, 2) without any signed integer overflow, but the
transformed function will overflow in the multiplication _12.

[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550

2017-02-06 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #3 from krister.walfridsson at gmail dot com ---
Correction: -fno-split-paths does not help the trunk compiler. But it restores
the result when using the r242550 compiler...

[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550

2017-02-06 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #2 from krister.walfridsson at gmail dot com ---
No, I get the same reduced performance when using -fno-split-paths

[Bug tree-optimization/79390] New: 10% performance drop in SciMark2 LU after r242550

2017-02-06 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Bug ID: 79390
   Summary: 10% performance drop in SciMark2 LU after r242550
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

Created attachment 40677
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40677=edit
The relevant source code and generated asm before/after this change

The dense LU matrix factorization test from the old SciMark2
(http://math.nist.gov/scimark) used in the Phoronix compiler test suite has
regressed 10% compared to the November trunk when run on Intel i7 6800K
Broadwell (compiled with "-O3 -march=native"). GCC 6 generated much slower
code, so this is not a regression compared to released versions of the
compiler.

The regression was introduced in r242550:

r242550 | wschmidt | 2016-11-17 15:22:17 +0100 (tor, 17 nov 2016) | 18 lines

[gcc]

2016-11-17  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
Richard Biener  <rguent...@suse.de>

PR tree-optimization/77848
* tree-if-conv.c (tree_if_conversion): Always version loops unless
the user specified -ftree-loop-if-convert.

[gcc/testsuite]

2016-11-17  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
Richard Biener  <rguent...@suse.de>

PR tree-optimization/77848
* gfortran.dg/vect/pr77848.f: New test.

and has the effect that the pivot-finding loop

int LU_factor(int M, int N, double **A,  int *pivot)
{
  int minMN =  M < N ? M : N;
  int j=0;

  for (j=0; j<minMN; j++)
  {
/* find pivot in column j and  test for singularity. */

int jp=j;
int i;

double t = fabs(A[j][j]);
for (i=j+1; i<M; i++)
{
  double ab = fabs(A[i][j]);
  if ( ab > t)
  {
jp = i;
t = ab;
  }
}

pivot[j] = jp;
...

is transformed. The perf output seems to say that this is due to bad branch
prediction, but I do not understand x86 assembler enough to be able to
determine its cause (or to say if it really is a bug or just some random thing
the compiler cannot know about...)

[Bug tree-optimization/79389] New: 30% performance regression in SciMark2 MonteCarlo

2017-02-06 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79389

Bug ID: 79389
   Summary: 30% performance regression in SciMark2 MonteCarlo
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

Created attachment 40676
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40676=edit
The relevant source code and generated assembler before/after this change

The MonteCarlo test from the old SciMark2 (http://math.nist.gov/scimark) used
in the Phoronix compiler test suite has regressed 30% compared to GCC 6.3 when
run on Intel i7 6800K Broadwell (compiled with "-O3 -march=native").

The regression was introduced in r238005:

r238005 | rguenth | 2016-07-05 15:25:47 +0200 (tis, 05 jul 2016) | 7 lines

2016-07-05  Richard Biener  <rguent...@suse.de>

* gimple-ssa-split-paths.c (find_block_to_duplicate_for_splitting_pa):
Handle empty else block.
(is_feasible_trace): Likewise.
(split_paths): Likewise.


and has the effect that the if-statement in the loop

for (count=0; count<Num_samples; count++)
{
  double x= Random_nextDouble(R);
  double y= Random_nextDouble(R);
  if ( x*x + y*y <= 1.0)
under_curve ++;
}
is changed from a cmov to a branch, which mispredicts.

[Bug c++/79205] New: ICE in create_tmp_var, at gimple-expr.c:473

2017-01-23 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79205

Bug ID: 79205
   Summary: ICE in create_tmp_var, at gimple-expr.c:473
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

gcc version 7.0.1 20170124 (r244846) ICEs when compiling the following (using
the command line "g++ -c -std=c++1z bug.cpp")

#include 

int foo(std::tuple t)
{
  auto [x0] = t;
  return x0;
}

[Bug middle-end/78847] New: pointer arithmetic from c++ ranged-based for loop not optimized

2016-12-17 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78847

Bug ID: 78847
   Summary: pointer arithmetic from c++ ranged-based for loop not
optimized
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

GCC has some problems eliminating overhead from C++ range-based for loops.
Consider the program

#include 
#include 
#include 
using string_view = std::experimental::string_view;

class Foo {
constexpr static size_t Length = 9;
char ascii_[Length];
public:
Foo();
string_view view() const {
return string_view(ascii_, Length);
}
};

void testWithLoopValue(const Foo foo, size_t ptr, char *buf_) {
  for (auto c : foo.view())
buf_[ptr++] = c;
}

compiled as
  g++ -O3 -S -std=c++1z k.cpp


ldist determines that this is a memcpy of length expressed as _14

  _18 = (unsigned long) [(void *) + 9B];
  _4 = _ + 1;
  _3 = (unsigned long) _4;
  _16 = _18 + 1;
  _14 = _16 - _3;

and dom3 improves this to

  _18 = (unsigned long) [(void *) + 9B];
  _3 = (unsigned long) [(void *) + 1B];
  _16 = _18 + 1;
  _14 = _16 - _3;

But this is not further simplified to 9 until combine, where it is too late,
and a call to memcpy is generated instead of the expected inlined version.

[Bug tree-optimization/78343] New: Loop is not eliminated

2016-11-13 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78343

Bug ID: 78343
   Summary: Loop is not eliminated
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krister.walfridsson at gmail dot com
  Target Milestone: ---

GCC 6 and trunk generates inefficient code for the loop

  unsigned int test(unsigned int quant)
  {
unsigned int sum = 0;
for (unsigned int i = 0; i < quant; ++i){
  sum += quant;
}

return sum;
  }

as noted in the tweet
https://twitter.com/lefticus/status/797593368037642244?s=09.

This is a regression introduced in r233207; GCC used to generate

  test:
movl%edi, %eax
imull   %edi, %eax
ret

before r233207, and it generates a meaningless loop

  test:
testl   %edi, %edi
je  .L4
xorl%edx, %edx
  .L3:
addl$1, %edx
cmpl%edx, %edi
jne .L3
movl%edi, %eax
imull   %edi, %eax
ret
  .L4:
xorl%eax, %eax
ret

after that change.

[Bug tree-optimization/78035] Inconsistency between address comparison and alias analysis

2016-11-02 Thread krister.walfridsson at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78035

krister.walfridsson at gmail dot com changed:

   What|Removed |Added

 CC||krister.walfridsson at gmail 
dot c
   ||om

--- Comment #9 from krister.walfridsson at gmail dot com ---
Doesn't this just introduce more inconsistencies in the compiler? For example

extern int a;
extern int b;

int foo(void)
{
a = 1;
b = 5;
a++;
return  != 
}

optimizes to

foo:
movl$a, %eax
movl$5, b(%rip)
movl$2, a(%rip)
cmpq$b, %rax
setne   %al
movzbl  %al, %eax
ret

That is, the accesses to a and b are optimized as if they are distinct, even
though the compiler keeps the comparison of the addresses.

I cannot think of a reasonable use case where you must handle comparisons of
the addresses as currently implemented while allowing other optimizations as if
the objects are distinct, so I'd say the bug from the original description is
that we were "being too conservative in bar"...

[Bug fortran/48244] iso-c-binding support missing on NetBSD (with patch)

2013-06-29 Thread krister.walfridsson at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48244

--- Comment #2 from krister.walfridsson at gmail dot com ---
 --- Comment #1 from Dominique d'Humieres dominiq at lps dot ens.fr ---
 Is there still maintainers/users of NetBSD?

There are still users.  But my paperwork is not in order since I
changed employer some years ago, so I am not allowed to commit
anything... :(

   /Krister

[Bug c/94392] New: Infinite loops are optimized away for C99

[Bug tree-optimization/81388] Incorrect code generation with -O1 -fno-strict-overflow

[Bug c/80852] Optimisation fails to recognise sum computed by loop

[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO

[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion

[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion

[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion

[Bug tree-optimization/80520] New: Performance regression from missing if-conversion

[Bug tree-optimization/79721] New: Scalar evolution introduces signed overflow

[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550

[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550

[Bug tree-optimization/79390] New: 10% performance drop in SciMark2 LU after r242550

[Bug tree-optimization/79389] New: 30% performance regression in SciMark2 MonteCarlo

[Bug c++/79205] New: ICE in create_tmp_var, at gimple-expr.c:473

[Bug middle-end/78847] New: pointer arithmetic from c++ ranged-based for loop not optimized

[Bug tree-optimization/78343] New: Loop is not eliminated

[Bug tree-optimization/78035] Inconsistency between address comparison and alias analysis

[Bug fortran/48244] iso-c-binding support missing on NetBSD (with patch)

18 matches

Site Navigation

Mail list logo

Footer information