[Bug c++/57176] copy elision with function arguments passed by value

2024-03-25 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57176

Marc Glisse  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Marc Glisse  ---
(In reply to Jonathan Wakely from comment #5)
> Is it worth keeping this open if we're not allowed to make this change?

Probably not since wg21 explicitly added text to forbid this optimization. It
belongs in some non-existent wg21 feature request list...

[Bug c++/58055] [meta-bug] RVO / NRVO improvements

2024-03-25 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58055
Bug 58055 depends on bug 57176, which changed state.

Bug 57176 Summary: copy elision with function arguments passed by value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57176

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)

2023-09-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643

Marc Glisse  changed:

   What|Removed |Added

 CC||glisse at gcc dot gnu.org

--- Comment #2 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #1)
> I am 99% sure this is falls under don't do this as flatten inlines
> everything it can that the function calls ...

Maybe people end up abusing flatten because we are missing a convenient way for
a caller to ask that a call be inlined? From the callee, we can use
always_inline (couldn't this be used on name_original in this testcase?), but
from the caller... Here even a non-recursive version of flatten would have
helped.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #26 from Marc Glisse  ---
(In reply to CVS Commits from comment #22)
> While the design of these builtins in clang is questionable,
> rather than being say
> unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
> so that it is clear they add two [0, 0x] range numbers
> plus one [0, 1] range carry in and give [0, 0x] range
> return plus [0, 1] range carry out, they actually instead
> add 3 [0, 0x] values together but the carry out
> isn't then the expected [0, 2] value because
> 0xULL + 0x + 0x is 0x2fffd,
> but just [0, 1] whether there was any overflow at all.

That is very strange. I always thought that the original intent was for
__builtin_addc to assume that its third argument was in [0, 1] and generate a
single addc instruction on hardware that has it, and the type only ended up
being the same as the others for convenience (also C used not to have a bool
type). The final overflow never being 2 confirms this.

It may be worth discussing with clang developers if they would be willing to
document such a [0, 1] restriction, and maybe have ubsan check it.

[Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon

2023-01-07 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #12 from Marc Glisse  ---
(In reply to Marc Glisse from comment #11)
> Since I had forgotten where it was, let me write here that it is git branch
> /users/glisse/fenv

Since it became impossible (hooks) to push to that branch a while ago, I should
post somewhere the FIXME file I couldn't push last year:

Looking at LLVM, I notice that my design in the gcc fenv branch seems to be
missing a fundamental piece: it has nothing preventing "normal" operations from
outside from migrating towards the protected region, where they may end up
using an unexpected rounding mode (unprotected doesn't mean any rounding mode,
it means the default one), or setting flags that we will observe.
One idea to prevent this would be to make sure that there are no normal FP
operations in functions that have protected operations (does that mean we
should mark functions? Just checking if there is a protected FP op doesn't work
if we call a function that does the op).
This means that we should turn all FP operations of the function into protected
ones (possibly with more relaxed flags if they are not in the protected
region), and we should also do that whenever inlining mixed functions. And
cross my fingers that the compiler doesn't start using FP ops out of thin air.
Would that be sufficient?

[Bug testsuite/108190] g++.target/i386/*pr54700*.C testcases fail on x86_64-mingw

2022-12-21 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108190

--- Comment #6 from Marc Glisse  ---
Indeed, this looks like a common issue (at least with the x86 backend): the
memory load is combined with the comparison before we try combining the
comparison with the blend, and this last combination is then rejected because
it expects a register, not memory. So either we are too eager in merging loads
with instructions, or we reject instructions too early when we could still fix
the operands with an extra load.

[Bug tree-optimization/89317] Ineffective code from std::copy

2022-12-11 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89317

--- Comment #11 from Marc Glisse  ---
(In reply to Richard Biener from comment #10)
> Should be fixed in GCC 13.

If I compile the original testcase with -O3, I get for test2:

  _1 = this_6(D) + 16;
  _2 = _6(D)->data1;
  if (_1 != _2)

so we should probably also handle comparisons and not just subtractions. For
this particular testcase, the relevant optimizations still happen and RTL
cleans up the comparison, so it is ok, but the pattern appears in other PRs
like PR 106677.

[Bug tree-optimization/107663] -Wmaybe-uninitialized does not catch an uninitialized variable if its address was passed at -O0 and there was a call before hand

2022-11-14 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107663

--- Comment #1 from Marc Glisse  ---
Gcc often ignores the control flow for alias/escape analysis. v escapes at some
point, and that's enough to prevent gcc from noticing that nothing can have
written to v *before* the use. The same thing also hinders some optimizations,
I am sure there are duplicates in bugzilla.

[Bug c++/107622] Missing optimization of switch-statement

2022-11-11 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107622

--- Comment #7 from Marc Glisse  ---
(Wilhelm, when you post testcases, please post the full file including the
#include lines)

(In reply to Richard Biener from comment #5)
> Did you try -fstrict-enums?

IIUC, even if optimizations using -fstrict-enums were implemented, they would
only help with the first testcase if the number of enum values was a power of
2. For {A,B,C}, -fstrict-enums still considers 3 a valid value.

I have long wanted an attribute to specify that a particular enum is only
allowed to take the values explicitly listed, though I cannot find a relevant
issue in bugzilla at the moment.


Comment #4 is an independent issue where gcc fails to notice that since the
static variable does not escape, it can be replaced with a local constant.

[Bug target/107546] [10/11/12/13 Regression] simd, redundant pcmpeqb and pxor

2022-11-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107546

--- Comment #5 from Marc Glisse  ---
typedef signed char v16qs __attribute__((vector_size(16)));
auto bar(v16qs x) { return x < 48; }

clang does expand it as 48 gt x. Gcc however does its usual change to x <= 47,
which it then tries to expand as ~(x > 47). I guess the expansion for x <= y
could be tweaked in the case where one argument is constant to undo what was
done earlier in the pipeline and expand as 48 > x.

[Bug tree-optimization/107520] New: Optimize std::lerp(d, d, 0.5)

2022-11-03 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107520

Bug ID: 107520
   Summary: Optimize std::lerp(d, d, 0.5)
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

In some C++ code I have, it would be convenient if the compiler, possibly with
the help of the standard library, could make the following function cheap,
ideally just the identity. I'll probably end up wrapping lerp with a function
that first checks with __builtin_constant_p if the 2 bounds are equal, but I'll
post this in case people have ideas how to improve things.

#include 
double f(double d){
  return std::lerp(d, d, .5);
}

Currently, with -O3, we generate

movapd  %xmm0, %xmm1
pxor%xmm0, %xmm0
comisd  %xmm1, %xmm0
jnb .L7
comisd  %xmm0, %xmm1
jb  .L6
.L7:
pxor%xmm0, %xmm0
ucomisd %xmm0, %xmm1
jp  .L6
je  .L11
.L6:
movapd  %xmm1, %xmm0
subsd   %xmm1, %xmm0
mulsd   .LC1(%rip), %xmm0
addsd   %xmm1, %xmm0
maxsd   %xmm1, %xmm0
ret
.p2align 4,,10
.p2align 3
.L11:
mulsd   .LC1(%rip), %xmm1
movapd  %xmm1, %xmm0
addsd   %xmm1, %xmm0
ret

(clang is better at avoiding the redundant comparison)

With -fno-trapping-math to help a bit, I see at the beginning

  if (d_2(D) == 0.0)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 475287355]:
  _7 = d_2(D) * 5.0e-1;
  _10 = _7 * 2.0e+0;

I think that even with the default -fsigned-zeros, simplifying to _10 = d_2(D)
is valid.

Adding -fno-signed-zeros

   [local count: 1073741824]:
  if (d_2(D) == 0.0)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 598454470]:
  _13 = d_2(D) - d_2(D);
  _14 = _13 * 5.0e-1;
  __x_15 = d_2(D) + _14;
  if (d_2(D) u>= __x_15)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 299227235]:

   [local count: 1073741825]:
  # _12 = PHI 
  return _12;

_13 is 0 or NaN, which doesn't change for _14, and __x_15 is just d_2, so we
always return d_2.

[Bug tree-optimization/54346] combine permutations

2022-10-11 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #6 from Marc Glisse  ---
The log says that this breaks tree-ssa/forwprop-19.c, but I don't see any xfail
or anything. Does it only fail because gimple-simplify leaves some dead code
around, so you could update the test to scan the next DCE pass dump instead of
forwprop1? Or are we missing a transformation that just detects a VEC_PERM_EXPR
with an identity permutation?

[Bug tree-optimization/107184] Copy warnings in dump files

2022-10-10 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107184

--- Comment #3 from Marc Glisse  ---
(In reply to Richard Biener from comment #2)
> Confirmed - for array-bounds I added some "array-bound warning for %E"
> printing the SSA name/stmt in the dump file.

Sounds good, I'll try that next time the warning is of the array-bound type.

> What I find useful in tracking down things is to -fdump-tree-FOO-lineno which
> at least gets you the locations in the dump.

Ah, I didn't know that one (-lineno isn't part of -all). It is nice, but with
inlining and all the corresponding source line actually appears hundreds of
times in the dump, and this does not tell me which of those causes the warning.

[Bug tree-optimization/107184] New: Copy warnings in dump files

2022-10-08 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107184

Bug ID: 107184
   Summary: Copy warnings in dump files
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

One irritation with warnings like -Wuse-after-free and all the other
optimization-based warnings is how hard they are to track. Yes, it tells me
where the call is in my code, but that's far from enough. With
-fdump-tree-waccess, I can have some idea of what the code looks like, after
various optimizations, that makes the compiler warn. However, identifying the
relevant statements in the dump file can take a long time, and it remains
faster to break out the debugger on the compiler :-(
It seems that a small thing that could help a bit would be to print a copy of
the warnings and notes in the dump file, next to the relevant statements. Or at
least some easy to find marker.
I most certainly don't claim that this will solve anything, I just see it as a
low (?) hanging fruit.

[Bug c++/107065] GCC treats rvalue as an lvalue

2022-09-30 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107065

--- Comment #13 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #12)
> Doing it on the match.pd side doesn't look right, there could be many other
> optimizations that result in something similar.

$ grep -c non_lvalue match.pd   
12

probably they should be removed and those that were useful should be fixed by
similar techniques as you are considering...

To add one more option to your list, maybe the generic-simplify machinery could
add non_lvalue automatically in some cases? I still prefer your first option
though, tweaking the warning code, which probably expected x!=0 and now gets
!(x==0) or something similar.

[Bug c++/107065] GCC treats rvalue as an lvalue

2022-09-30 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107065

--- Comment #11 from Marc Glisse  ---
Did you try adding "non_lvalue" in match.pd? It looks less intrusive. Although
in the long term your approach seems better and the failures should be fixable.

[Bug c++/107065] GCC treats rvalue as an lvalue

2022-09-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107065

--- Comment #8 from Marc Glisse  ---
(simplify
  (bit_not (bit_not @0))
  @0)

while in an other place we have

(simplify
 (bit_and @0 integer_all_onesp)
  (non_lvalue @0))

[Bug middle-end/106805] Undue optimisation of floating-point comparisons

2022-09-01 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106805

--- Comment #2 from Marc Glisse  ---
A problematic optimization pointed in the discussion:

 (simplify
  (cmp @0 REAL_CST@1)
[...]
   (if (REAL_VALUE_ISNAN (TREE_REAL_CST (@1))
&& !tree_expr_signaling_nan_p (@1)
&& !tree_expr_maybe_signaling_nan_p (@0))
{ constant_boolean_node (cmp == NE_EXPR, type); })

[Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon

2022-08-26 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

--- Comment #11 from Marc Glisse  ---
(In reply to Segher Boessenkool from comment #8)
> Thanks for the pointer, I'll find Marc's work.

Since I had forgotten where it was, let me write here that it is git branch
/users/glisse/fenv

[Bug tree-optimization/106247] GCC12 warning in Eigen: array subscript is partly outside array bounds

2022-08-19 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106247

Marc Glisse  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #6 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #2)
> the warning is correct in the sense the load is there in IR, though it looks
> like it is dead (but only because b and a are unused):

#include 
Eigen::Array a;
Eigen::Array b;
void f(){
b.col(0).tail(2) = a.col(1);
}

still warns about some 256 bit code which is still very dead (we later optimize
the whole function to just
  _64 = MEM  [(char * {ref-all}) + 16B];
  MEM  [(char * {ref-all}) + 8B] = _64;
)
so the fact that a and b are unused in the original testcase is not important.

I assume there were good reasons to put the warning this early (VRP1), but it
means that some dead code that will be removed later is still around.

(@Daniel: it can be easier to track things with separate issues, if you have a
different testcase to provide)

[Bug tree-optimization/106677] New: Abstraction overhead with std::views::join

2022-08-18 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106677

Bug ID: 106677
   Summary: Abstraction overhead with std::views::join
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

(from https://stackoverflow.com/q/73407636/1918193 )

#include 
#include 
#include 

struct Foo {
auto join() const { return m_array | std::views::join; }
auto direct() const { return std::views::all(m_array[0]); }
std::array, 1> m_array;
};
__attribute__((noinline)) int sum_array(const Foo& foo)
{
int result = 0;
for (int* val : foo.join())
result += *val;
return result;
}
__attribute__((noinline)) int sum_vec(const Foo& foo)
{
int result = 0;
for (int* val : foo.direct())
result += *val;
return result;
}

I am using a snapshot from 20220719 with -std=gnu++2b -O3 and looking at
.optimized dumps.

sum_vec gets relatively nice, short code. sum_array gets something uglier.

  _18 = _5(D)->m_array;
  _6 = foo_5(D) + 24;
  if (_6 != _18)

Err, x != x+24 should be folded to false? Let's add

  if(foo.m_array.begin()==foo.m_array.end())__builtin_unreachable();

to move forward.

  _16 = MEM[(int * const * const &)foo_4(D)];
  _17 = MEM[(int * const * const &)foo_4(D) + 8];
  if (_16 != _17)
goto ; [5.50%]
  else
goto ; [94.50%]

why are we guessing that the vector is probably empty? Let's look at more code

   [local count: 853673669]:
  _10 = [(const struct array *)foo_4(D)]._M_elems;
  _7 = foo_4(D) + 24;
  _16 = MEM[(int * const * const &)foo_4(D)];
  _17 = MEM[(int * const * const &)foo_4(D) + 8];
  if (_16 != _17)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 806721618]:
  _18 = foo_4(D) + 24;

   [local count: 96636762]:
  # SR.89_28 = PHI <_10(2), _18(3)>
  # SR.90_41 = PHI <_16(2), 0B(3)>
  goto ; [100.00%]

   [local count: 923031551]:
  # result_2 = PHI <0(4), result_12(8)>
  # SR.89_13 = PHI 
  # SR.90_30 = PHI 
  if (_7 == SR.89_13)
goto ; [30.00%]
  else
goto ; [70.00%]

   [local count: 276909463]:
  if (SR.90_30 == 0B)
goto ; [16.34%]
  else
goto ; [83.66%]

   [local count: 96636764]:
  # result_31 = PHI 
  return result_31;

(why not _18 = _7 towards the beginning?)
It would be nice if threading could isolate the case of an empty vector: 2 -> 3
-> 4 -> 9 -> 10 -> 11: just return 0, and the rest of the code may become
easier to optimize.

Let me add

  if(foo.m_array[0].begin()==foo.m_array[0].end())__builtin_unreachable();

to avoid the empty vector case as well. This looks better, at least the inner
loop looks normal, but we are still iterating on the elements of m_array, when
we should be able to tell that it has exactly 1 element.

[Bug libstdc++/80331] unused const std::string not optimized away

2022-06-05 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331

--- Comment #10 from Marc Glisse  ---
(In reply to AK from comment #9)
> can't repro this with gcc 12.1 Seems like this is fixed?

No. As stated in other comments, it still reproduces with a longer string (or
with -D_GLIBCXX_USE_CXX11_ABI=0).

[Bug libstdc++/105308] Specialize for_each

2022-04-19 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105308

--- Comment #2 from Marc Glisse  ---
(In reply to Jonathan Wakely from comment #1)
> I'm unclear what the request is.

The list isn't super clear to me either, any sensible specialization of a
standard algorithm for a standard container. Even simply
ranges::for_each(std::set,*) looks like it could be a bit faster with a
specialization instead of using iterators.

> Are you proposing this for the parallel
> std::for_each with an execution policy?

Yes, that's the first motivation.

> That code comes from the PSTL project which is part of LLVM,
> and maintained by Intel, so enhancements to it should ideally be done 
> upstream.

But the code would need to use private interfaces of libstdc++'s _Rb_tree. Does
PSTL contain a lot of special code, with one variant for libstdc++ / libc++ /
other, that uses internals of the datastructures?

[Bug libstdc++/105308] New: Specialize for_each

2022-04-19 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105308

Bug ID: 105308
   Summary: Specialize for_each
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

Hello,

with a balanced binary tree, as used for instance in std::set or std::map, it
is relatively easy to perform an operation in parallel on all elements (like
for_each): recurse on the 2 subtrees in parallel (and probably assign the top
node to one of the subtrees arbitrarily). Of course there are technical
details, we don't store the size of subtrees so we may want to decide in
advance how deep to switch to sequential, etc. Doing this requires accessing
details of the tree implementation and cannot be done by a user (plus, for_each
doesn't seem to be a customization point...).

I am still confused that we have the traditional for_each, the new for_each
with execution policy, the new range for_each, but no mixed range + execution
policy. This specialization would be easier to implement for a whole tree than
for an arbitrary subrange. It is still possible there, but likely less
balanced, and we may need a first pass to find the common ancestor and possibly
other relevant information (or check if the range is the whole container if
that's possible and only optimize that case).

Possibly some other containers could specialize for_each, although it isn't as
obvious.

Actually, even the sequential for_each could benefit from a specialization for
various containers. Recursing on subtrees is a bit cheaper than having the
iterator move up and down, forward_list could avoid pointing to the previous
element, dequeue could try to split at block boundaries, etc.

Other algorithms that iterate through a range like reduce, all_of, etc could
also benefit, hopefully most are simple wrappers around others so few would
need a specialization.

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2022-04-03 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #6 from Marc Glisse  ---
(blend is now lowered in gimple)

For the integer case, the mix of vector(int) and vector(char) obfuscates things
a bit, we have

__m256i if_else_int (__m256i x, __m256i y)
{
  vector(32) char _4;
  vector(32) char _5;
  vector(32) char _6;
  vector(32)  _7;
  vector(32) char _8; 
  vector(4) long long int _9;
  vector(8) int _10;
  vector(8) int _11;
  vector(8)  _12;
  vector(8) int _13;

   [local count: 1073741824]: 
  _10 = VIEW_CONVERT_EXPR(x_2(D));
  _11 = VIEW_CONVERT_EXPR(y_3(D));
  _12 = _10 > _11;
  _13 = VEC_COND_EXPR <_12, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0,
0, 0, 0, 0 }>;
  _5 = VIEW_CONVERT_EXPR(_13);
  _4 = VIEW_CONVERT_EXPR(y_3(D));
  _6 = VIEW_CONVERT_EXPR(x_2(D));
  _7 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _8 = VEC_COND_EXPR <_7, _4, _6>;
  _9 = VIEW_CONVERT_EXPR<__m256i>(_8);
  return _9;
}

A first step would be to teach gcc that it can do a VEC_COND_EXPR<_12, _11,
_10> with fewer VIEW_CONVERT_EXPR (maybe follow the definition chain of the
condition through trivial ops like <0, view_convert or ?-1:0 until we find a
real comparison _10 > _11, to determine the right size?).

Other steps:

* Move (or at least partially copy) fold_cond_expr_with_comparison to match.pd
so we can recognize min/max.

* Lower __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17) to GIMPLE for the float
case, if that's a valid thing to do (NaN, etc).

[Bug tree-optimization/105062] Suboptimal vectorization for reduction with several elements

2022-03-28 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105062

--- Comment #2 from Marc Glisse  ---
(In reply to Richard Biener from comment #1)
> But since not all of the std::max are recognized as
> MAX_EXPR but some only after loop if-conversion

Ah, I hadn't noticed. I tried replacing std::max with a simpler by-value
version so we get MAX_EXPR already in early inline, but that didn't help.
Actually, it made things worse: 

#include 
#include 
#include 
#include 
#include 
#include 

int my_max(int a, int b){ return (a> vec;
  vec.reserve(n);
  std::random_device rd;
  std::default_random_engine re(rd());
  std::uniform_int_distribution rand_int;
  std::uniform_real_distribution rand_dbl;
  for(int i=0;i(vec[i]),std::get<1>(vec[i])));
volatile int noopt0 = sup;
  }
#else
  {
int sup = 0;
for(int i=0;i(vec[i])),std::get<1>(vec[i]));
volatile int noopt1 = sup;
  }
#endif
  auto finish = std::chrono::system_clock::now();
  std::cout << std::chrono::duration_cast(finish -
start).count() << '\n';
}


Now reassoc1 turns the fast code into the slow code before the vectorizer can
detect the reduction chain :-(

[Bug tree-optimization/105062] New: Suboptimal vectorization for reduction with several elements

2022-03-26 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105062

Bug ID: 105062
   Summary: Suboptimal vectorization for reduction with several
elements
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

The testcase is essentially the same as in PR105053, but here this is about
performance, not correctness.

#include 
#include 
#include 
#include 
#include 
#include 

int main(){
  const long n = 1;
  std::vector> vec;
  vec.reserve(n);
  std::random_device rd;
  std::default_random_engine re(rd());
  std::uniform_int_distribution rand_int;
  std::uniform_real_distribution rand_dbl;
  for(int i=0;i(vec[i]),std::get<1>(vec[i])));
volatile int noopt0 = sup;
  }
#else
  {
int sup = 0;
for(int i=0;i(vec[i])),std::get<1>(vec[i]));
volatile int noopt1 = sup;
  }
#endif
  auto finish = std::chrono::system_clock::now();
  std::cout << std::chrono::duration_cast(finish -
start).count() << '\n';
}


I compile with -O3 -march=skylake (originally noticed with -march=native on a
i7-10875H CPU).

The second loop runs in about 60ms, while the first (compiling with -DSLOW)
runs in 80ms. The generated asm also looks very different. For the fast code,
the core loop is

.L64:
vmovdqu (%rax), %ymm3
addq$64, %rax
vpunpckldq  -32(%rax), %ymm3, %ymm0
vpermd  %ymm0, %ymm2, %ymm0
vpmaxsd %ymm0, %ymm1, %ymm1
cmpq%rdx, %rax
jne .L64

which looks nice and compact (well, I think we could do without the vpermd, but
it is already great). Now for the slow code, we have

.L64:
vmovdqu (%rax), %ymm0
vmovdqu 32(%rax), %ymm10
vmovdqu 64(%rax), %ymm2
vmovdqu 96(%rax), %ymm9
vpermd  %ymm10, %ymm6, %ymm8
vpermd  %ymm0, %ymm7, %ymm1
vpblendd$240, %ymm8, %ymm1, %ymm1
vpermd  %ymm9, %ymm6, %ymm11
vpermd  %ymm2, %ymm7, %ymm8
vpermd  %ymm0, %ymm4, %ymm0
vpermd  %ymm10, %ymm3, %ymm10
vpermd  %ymm2, %ymm4, %ymm2
vpermd  %ymm9, %ymm3, %ymm9
vpblendd$240, %ymm11, %ymm8, %ymm8
vpblendd$240, %ymm10, %ymm0, %ymm0
vpblendd$240, %ymm9, %ymm2, %ymm2
vpermd  %ymm1, %ymm4, %ymm1
vpermd  %ymm8, %ymm3, %ymm8
vpermd  %ymm0, %ymm4, %ymm0
vpermd  %ymm2, %ymm3, %ymm2
vpblendd$240, %ymm8, %ymm1, %ymm1
vpblendd$240, %ymm2, %ymm0, %ymm0
vpmaxsd %ymm0, %ymm1, %ymm1
subq$-128, %rax
vpmaxsd %ymm1, %ymm5, %ymm5
cmpq%rdx, %rax
jne .L64

It is unrolled once more than the fast code and contains an excessive amount of
shuffling. If I understand correctly, it vectorizes a reduction with MAX_EXPR
on "sup" but does not consider the operation max(get<0>,get<1>) as being part
of this reduction, so it generates code that would make sense if I used 2
different operations like

  sup=std::max(sup,std::get<0>(vec[i])+std::get<1>(vec[i]))

instead of both being the same MAX_EXPR. Maybe, when we discover a reduction,
we could check if the elements are themselves computed with the same operation
as the reduction and in that case try to make it a "bigger" reduction?

[Bug tree-optimization/105053] Wrong loop count for scalar code from vectorizer

2022-03-25 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105053

--- Comment #8 from Marc Glisse  ---
Thank you.
I originally noticed the problem with 11.2.0-18 (Debian), so I believe this
will be needed on that branch as well. 10.3.0 looked ok...

[Bug tree-optimization/105053] New: Wrong loop count for scalar code from vectorizer

2022-03-25 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105053

Bug ID: 105053
   Summary: Wrong loop count for scalar code from vectorizer
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

#include 
#include 
#include 
#include 
#include 

int main(){
  const long n = 1;
  std::vector> vec;
  vec.reserve(n);
  std::random_device rd;
  std::default_random_engine re(rd());
  std::uniform_int_distribution rand_int;
  std::uniform_real_distribution rand_dbl;
  for(int i=0;i(vec[i]),std::get<1>(vec[i])));
std::cout << sup << '\n';
  }
  {
int sup = 0;
for(int i=0;i(vec[i])),std::get<1>(vec[i]));
std::cout << sup << '\n';
  }
}

Can output for instance
2147483645
2147483637
compiled with -O3, whereas the 2 numbers should be the same.

If I compare what I get from the first loop with -O3 -fno-tree-loop-vectorize
to the second loop with just -O3, the code is almost identical, except that the
(scalar) code only iterates on 1/4 of the array, as if it was using a bound
meant for a vector. -fno-tree-loop-vectorize seems to be ok.

[Bug tree-optimization/104675] [9/10/11/12 Regression] ICE: in expand_expr_real_2, at expr.cc:9773 at -O with __real__ + __imag__ extraction

2022-02-24 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104675

--- Comment #6 from Marc Glisse  ---
I am only learning now that bit ops don't exist for complex numbers :-/
I don't really see why not, but that's a different question. Thanks for fixing
this.
Looking to see if I could quickly find other similar issues, I only noticed 2
ICEs

typedef _Complex unsigned T;
T f(T x){
  return (x/2)*2;
}
T g(T x){
  return (x*2)/2;
}

[Bug tree-optimization/104420] New: [12 Regression] Inconsistent checks for X * 0.0 optimization

2022-02-07 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104420

Bug ID: 104420
   Summary: [12 Regression] Inconsistent checks for X * 0.0
optimization
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

(from a comment in PR 104389)

/* Maybe fold x * 0 to 0.  The expressions aren't the same
   when x is NaN, since x * 0 is also NaN.  Nor are they the
   same in modes with signed zeros, since multiplying a
   negative value by 0 gives -0, not +0.  Nor when x is +-Inf,
   since x * 0 is NaN.  */
(simplify
 (mult @0 real_zerop@1)
 (if (!tree_expr_maybe_nan_p (@0)
  && (!HONOR_NANS (type) || !tree_expr_maybe_infinite_p (@0))
  && !tree_expr_maybe_real_minus_zero_p (@0)
  && !tree_expr_maybe_real_minus_zero_p (@1))
  @1))

Notice how the comment talks about @0 being a "negative value" while the code
says "!tree_expr_maybe_real_minus_zero_p (@0)", which is not at all the same
thing.

Because tree_expr_maybe_real_minus_zero_p is rather weak, it does not trigger
so often, but still:

double f(int a){
  return a*0.;
}

is optimized to "return 0.;" whereas f(-42) should return -0.

[Bug tree-optimization/104389] [10/11/12 Regression] HUGE_VAL * 0.0 is no longer a NaN

2022-02-04 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104389

--- Comment #6 from Marc Glisse  ---
Not this bug, but note that the comment and the code don't match in this
transformation: "a negative value" becomes !tree_expr_maybe_real_minus_zero_p
(@0) which is quite different. I am not sure the path with a negative @0 for
which tree_expr_maybe_real_minus_zero_p returns false can be reached though.

[Bug libstdc++/104361] Biased Reference Counting for the standard library

2022-02-03 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104361

--- Comment #2 from Marc Glisse  ---
I looked at this paper for a different project a while ago, and it doesn't seem
like such a good match for C++ in general. While the basic idea looks simple
(use 2 counters, one for the thread that created the object, one for the
others), making it work in all cases is actually a lot of work. In particular
the paper requires a runtime that periodically checks a queue in each thread.

[Bug target/104239] [12 Regression] immintrin.h or x86gprintrin.h headers can't be included

2022-01-26 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104239

--- Comment #2 from Marc Glisse  ---
Thanks for fixing that bug, but don't you still have issues with
NO_WARN_X86_INTRINSICS if you rely on __has_include for immintrin.h?

[Bug c++/104235] New: [12 Regression] ICE: in cp_parser_template_id, at cp/parser.cc

2022-01-25 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104235

Bug ID: 104235
   Summary: [12 Regression] ICE: in cp_parser_template_id, at
cp/parser.cc
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

template 
struct L: M {
  using M::a;
  void p() { this->template a<>; }
};

(accepted by g++-11 and clang++-13)

bug.cc: In member function 'void L::p()':
bug.cc:4:31: internal compiler error: in cp_parser_template_id, at
cp/parser.cc:18367
4 |   void p() { this->template a<>; }
  |   ^
0x719e9c cp_parser_template_id
../../src/gcc/cp/parser.cc:18367
0xfa5beb cp_parser_class_name
../../src/gcc/cp/parser.cc:25694
0xf9bddb cp_parser_qualifying_entity
../../src/gcc/cp/parser.cc:7118
0xf9bddb cp_parser_nested_name_specifier_opt
../../src/gcc/cp/parser.cc:6800
0xf9da5a cp_parser_id_expression
../../src/gcc/cp/parser.cc:6148
0xfa63cf cp_parser_postfix_dot_deref_expression
../../src/gcc/cp/parser.cc:8305
0xf9a103 cp_parser_postfix_expression
../../src/gcc/cp/parser.cc:7904
0xf81eea cp_parser_binary_expression
../../src/gcc/cp/parser.cc:10041
0xf82a4e cp_parser_assignment_expression
../../src/gcc/cp/parser.cc:10345
0xf84579 cp_parser_expression
../../src/gcc/cp/parser.cc:10515
0xf87b97 cp_parser_expression_statement
../../src/gcc/cp/parser.cc:12711
0xf950b7 cp_parser_statement
../../src/gcc/cp/parser.cc:12507
0xf9619d cp_parser_statement_seq_opt
../../src/gcc/cp/parser.cc:12856
0xf96277 cp_parser_compound_statement
../../src/gcc/cp/parser.cc:12808
0xfb6565 cp_parser_function_body
../../src/gcc/cp/parser.cc:25052
0xfb6565 cp_parser_ctor_initializer_opt_and_function_body
../../src/gcc/cp/parser.cc:25103
0xfb746e cp_parser_function_definition_after_declarator
../../src/gcc/cp/parser.cc:31229
0xfb791c cp_parser_late_parsing_for_member
../../src/gcc/cp/parser.cc:32150
0xf8fb2a cp_parser_class_specifier_1
../../src/gcc/cp/parser.cc:26170
0xf90b72 cp_parser_class_specifier
../../src/gcc/cp/parser.cc:26194

[Bug c++/104184] [11/12 Regression] ICE Error reporting routines re-entered. xref_basetypes

2022-01-22 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104184

--- Comment #4 from Marc Glisse  ---
https://geometrica.saclay.inria.fr/team/Marc.Glisse/tmp/mybug.cc.xz
(1.7M after compression still exceeds the limit)

With -Wall -Wextra

Internal compiler error: Error reporting routines re-entered.
0xec0348 xref_basetypes(tree_node*, tree_node*)
../../src/gcc/cp/decl.cc:15783
0x101d194 instantiate_class_template_1
../../src/gcc/cp/pt.cc:11953
0x101ec31 instantiate_class_template(tree_node*)
../../src/gcc/cp/pt.cc:12311
0x10714d8 complete_type(tree_node*)
../../src/gcc/cp/typeck.cc:143
0xff0ad6 get_template_base
../../src/gcc/cp/pt.cc:23282
0xff2720 unify
../../src/gcc/cp/pt.cc:24348
0xff10d4 unify
../../src/gcc/cp/pt.cc:24499
0xfee75b unify_one_argument
../../src/gcc/cp/pt.cc:22472
0xfffd65 type_unification_real
../../src/gcc/cp/pt.cc:22595
0x1019da9 fn_type_unification(tree_node*, tree_node*, tree_node*, tree_node*
const*, unsigned int, tree_node*, unification_kind_t, int, conversion**, bool,
bool)
../../src/gcc/cp/pt.cc:21923
0xe146d9 add_template_candidate_real
../../src/gcc/cp/call.cc:3544
0xe15633 add_template_candidate
../../src/gcc/cp/call.cc:3632
0xe15633 add_candidates
../../src/gcc/cp/call.cc:6165
0xe1c362 add_candidates
../../src/gcc/cp/call.cc:6051
0xe1c362 build_new_method_call(tree_node*, tree_node*, vec**, tree_node*, int, tree_node**, int)
../../src/gcc/cp/call.cc:11012
0x1039e3d finish_call_expr(tree_node*, vec**,
bool, bool, int)
../../src/gcc/cp/semantics.cc:2788
0xfe96d4 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../src/gcc/cp/pt.cc:20780
0xff5e8c tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:16162
0x100494e tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:13423
0xff635e tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:15461

[Bug c++/104184] [11/12 Regression] ICE Error reporting routines re-entered. xref_basetypes

2022-01-22 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104184

--- Comment #3 from Marc Glisse  ---
comment #1 actually reduces to

struct voider {
  using type = void;
};
template  struct rename : P {};
template  using ignore = voider;
template  typename ignore::type>::type g(T a) {}
void f() { g(1); }

(still questionable and rejected by clang, I think I'll also attach the
compressed initial preprocessed file, in case the reductions hit different
bugs)

[Bug c++/104184] [11/12 Regression] ICE Error reporting routines re-entered. xref_basetypes

2022-01-22 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104184

--- Comment #2 from Marc Glisse  ---
And the stack trace for comment #1

Internal compiler error: Error reporting routines re-entered.
0xff6b0d tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:16068
0xff5f6d tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:16055
0x100494e tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:13423
0xff635e tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:15461
0xff5f6d tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:16055
0x100494e tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:13423
0xff635e tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:15461
0x1005a77 tsubst_decl
../../src/gcc/cp/pt.cc:14815
0x101d842 instantiate_class_template_1
../../src/gcc/cp/pt.cc:12076
0x101ec31 instantiate_class_template(tree_node*)
../../src/gcc/cp/pt.cc:12311
0x10714d8 complete_type(tree_node*)
../../src/gcc/cp/typeck.cc:143
0x107163d complete_type_or_maybe_complain(tree_node*, tree_node*, int)
../../src/gcc/cp/typeck.cc:156
0xff73d0 tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:16081
0x100494e tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:13423
0xff635e tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:15461
0xee6e31 dump_template_bindings
../../src/gcc/cp/error.cc:486
0xee0619 dump_function_decl
../../src/gcc/cp/error.cc:1805
0xee8602 decl_to_string
../../src/gcc/cp/error.cc:3225
0xee8602 cp_printer
../../src/gcc/cp/error.cc:4396
0x281b82f pp_format(pretty_printer*, text_info*)
../../src/gcc/pretty-print.cc:1475

[Bug c++/104184] [11/12 Regression] ICE Error reporting routines re-entered. xref_basetypes

2022-01-22 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104184

--- Comment #1 from Marc Glisse  ---
A different reduction from the same code. This one does not compile with clang
anymore. ICEs with -Wall -W, but not if I remove one of them.

using mp_false = struct mp_identity { using type = void; };
template  using mp_if_c = typename T ::type;
template  using mp_at_c = typename mp_if_c::type;
template  using make_arg_list = List;
template  using make_parameter_spec_items = SpecSeq;
template  struct argument_pack {
  using type =
  mp_at_c::type,
  typename Parameters::deduced_listboosttag_keyword_arg,
  mp_false>::type,
  0>;
};
void no_exude();
template  using boost_param_result_465refine_mesh_3 = mp_identity;
template 
typename boost_param_result_465refine_mesh_3<
typename argument_pack::type>::type
refine_mesh_3(ParameterArgumentType0, ParameterArgumentType1,
  ParameterArgumentType2, ParameterArgumentType3,
  ParameterArgumentType4, ParameterArgumentType5 a5) {}
int verify___trans_tmp_1, image_domain;
struct Tester {
  template 
  void verify(C3t3 c3t3, Domain domain, Criteria criteria, Domain_type_tag) {
refine_mesh_3(c3t3, domain, criteria, no_exude, verify___trans_tmp_1,
  verify___trans_tmp_1);
  }
} image_c3t3;
struct Image_tester : Tester {
  void image() {
void criteria();
verify(image_c3t3, image_domain, criteria, int());
  }
};

[Bug c++/104184] New: [11/12 Regression] ICE Error reporting routines re-entered. xref_basetypes

2022-01-22 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104184

Bug ID: 104184
   Summary: [11/12 Regression] ICE Error reporting routines
re-entered. xref_basetypes
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

This is reduced from valid code (I think) and it still compiles with "g++ -c
-w" or "clang++ -c", although the undefined inline function seems to play a
strong role, so this may not be exactly the same as the original ICE. The call
stack starts the same (xref_basetypes) but in the original it went through
"unify" and only failed with gcc-12 (not 11) with -Wall and -Wextra.

template  using mp_size_t = int;
template  struct mp_list;
template  struct mp_identity { using type = int; };
template  struct mp_inherit : T... {};
template  using mpmf_wrap = mp_identity;
template  using mpmf_unwrap = typename T::type;
template  struct mp_map_find_impl;
template  class M, class... T, class K> 
struct mp_map_find_impl, K> { 
  using U = mp_inherit...>;
  static mp_identity f(mp_identity *); 
  using type = mpmf_unwrap;
};  
template  
using mp_map_find = typename mp_map_find_impl::type;
template  using mp_second = int;
template  struct mp_at_c_impl { 
  using _map = mp_list, int>;
  using type = mp_second>>;
};  
template  using make_arg_list = mp_identity;
template  struct argument_pack { 
  using type = typename mp_at_c_impl<
  typename make_arg_list::type,
  0>::type;
}; 
struct parameters {
  typedef mp_list<> parameter_spec;
}; 
template  
using boost_param_result_39refine_mesh_3 = mp_identity;
template  
inline typename boost_param_result_39refine_mesh_3<
typename argument_pack::type>::type
refine_mesh_3();
int main() { refine_mesh_3(); }

Internal compiler error: Error reporting routines re-entered.
0xec0348 xref_basetypes(tree_node*, tree_node*)
../../src/gcc/cp/decl.cc:15783
0x101d194 instantiate_class_template_1
../../src/gcc/cp/pt.cc:11953
0x101ec31 instantiate_class_template(tree_node*)
../../src/gcc/cp/pt.cc:12311
0x10714d8 complete_type(tree_node*)
../../src/gcc/cp/typeck.cc:143
0x102d168 lookup_base(tree_node*, tree_node*, int, base_kind*, int)
../../src/gcc/cp/search.cc:229
0xe0ea26 standard_conversion
../../src/gcc/cp/call.cc:1403
0xe12484 implicit_conversion_1
../../src/gcc/cp/call.cc:2031
0xe12484 implicit_conversion
../../src/gcc/cp/call.cc:2131
0xe13d7e add_function_candidate
../../src/gcc/cp/call.cc:2465
0xe15349 add_candidates
../../src/gcc/cp/call.cc:6182
0xe1c362 add_candidates
../../src/gcc/cp/call.cc:6051
0xe1c362 build_new_method_call(tree_node*, tree_node*, vec**, tree_node*, int, tree_node**, int)
../../src/gcc/cp/call.cc:11012
0x1039e3d finish_call_expr(tree_node*, vec**,
bool, bool, int)
../../src/gcc/cp/semantics.cc:2788
0xfe96d4 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../src/gcc/cp/pt.cc:20780
0xff5e8c tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:16162
0x100494e tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:13423
0xff635e tsubst(tree_node*, tree_node*, int, tree_node*)
../../src/gcc/cp/pt.cc:15461
0x1005a77 tsubst_decl
../../src/gcc/cp/pt.cc:14815
0x101d842 instantiate_class_template_1
../../src/gcc/cp/pt.cc:12076
0x101ec31 instantiate_class_template(tree_node*)
../../src/gcc/cp/pt.cc:12311

[Bug tree-optimization/90433] POINTER_DIFF_EXPR in vectorizer prologue

2021-12-12 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90433

--- Comment #3 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #2)
> The trunk we don't vectorize the code any more .

I thought it might be because we found a way to use memcpy instead, which would
have been good, but no, the vect dump shows an extremely common gcc issue

missed:  not vectorized: more than one data ref in stmt: MEM[(struct
_Tuple_impl *)__cur_14].D.36092 = MEM[(struct _Tuple_impl
&)__first_19].D.36092;

[Bug libstdc++/51653] More compact std::tuple

2021-12-05 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51653

--- Comment #6 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #5)
> Is there anything more to do for this?

Yes. This PR is about having the library reorder the elements of a tuple to
minimize the size, and the current code does not do anything like that. Now
this would be an ABI break, and even if it wasn't we might not want to do that,
so it is ok if a libstdc++ maintainer decides to close it as wontfix.

[Bug libstdc++/103453] New: ASAN detection with clang

2021-11-27 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103453

Bug ID: 103453
   Summary: ASAN detection with clang
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

Libstdc++ uses __SANITIZE_ADDRESS__ to detect if ASAN is enabled, but with
clang that should be __has_feature(address_sanitizer). This means that
_GLIBCXX_SANITIZE_STD_ALLOCATOR is not automatically defined, and thus defining
_GLIBCXX_SANITIZE_VECTOR has no effect.

(noticed in https://stackoverflow.com/q/70117470/1918193 )

[Bug c/102760] ICE: in decompose, at wide-int.h:984

2021-10-15 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102760

--- Comment #3 from Marc Glisse  ---
(In reply to Martin Liška from comment #2)
> Likely triggered with r7-821-gc7986356a1ca8e8e.

>From Andrew's comment, it looks like the bug is before that transformation,
since it receives a bit_and_expr of type int with an argument of type char, no?

[Bug testsuite/53155] Not parallel: test for -j fails with new make

2021-09-13 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155

--- Comment #6 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #5)
> Hmm, Did something change in make?

It looks like make now splits -j from other flags in MFLAGS, -wkj becomes -kw
-j, so the old filters probably work now...

[Bug sanitizer/97868] warn about using fences with TSAN

2021-09-11 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97868

--- Comment #6 from Marc Glisse  ---
(In reply to pavlick from comment #5)
> Why is there false positive and no warning about the unsupported feature
> (atomic_thread_fence)?

You are probably using an old version of gcc. With a recent one, this prints

In function 'void std::atomic_thread_fence(std::memory_order)',
inlined from 'void Test::add()' at 3.cc:14:22:
/usr/lib/gcc-snapshot/include/c++/12/bits/atomic_base.h:126:26: warning:
'atomic_thread_fence' is not supported with '-fsanitize=thread' [-Wtsan]
  126 |   { __atomic_thread_fence(int(__m)); }
  | ~^~

[Bug libstdc++/58338] Add noexcept to functions with a narrow contract

2021-08-31 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58338

Marc Glisse  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED

--- Comment #16 from Marc Glisse  ---
No idea if there are low hanging fruits. I think the original idea was to get
consensus on the idea to add noexcept in various places, and this seems well
accepted now.
At some point (back when I thought I would have enough free time) my plan was
to implement some form of noexcept(auto) as an extension, I think most of the
remaining places where we may want to add noexcept would benefit from that. The
effort and risk in working around the lack of this feature (writing 10+ lines
of nexcept(...), is_nothrow_*, etc) make it not worth it to me.

[Bug rtl-optimization/43147] SSE shuffle merge

2021-08-25 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

--- Comment #17 from Marc Glisse  ---
(In reply to Hongtao.liu from comment #15)
> The issue can also be solved by folding __builtin_ia32_shufps to gimple
> VEC_PERM_EXPR,

Didn't you post a patch to do that last year? What happened to it?

[Bug c++/101795] (x > QNaNf) is not a constant expression

2021-08-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101795

--- Comment #1 from Marc Glisse  ---
Hint: -fno-trapping-math lets it compile.
It should probably be accepted in a manifestly_const_eval context, although
some in the committee wanted to prevent the use of NaN (and sometimes even
infinity!) in constant expressions...

[Bug tree-optimization/94356] Missed optimisation: useless multiplication generated for pointer comparison

2021-08-04 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94356

--- Comment #6 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #5)
> Hmm, the following is worse:

That looks like a separate issue. We have fold_comparison for GENERIC, and
match.pd has related patterns for integers, or for pointers with ==, but not
for pointers with <. Strange, I thought I had added those, possibly together
with pointer_diff since the behavior is similar.

[Bug libstdc++/101659] _GLIBCXX_DEBUG mode for std::optional ?

2021-07-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101659

--- Comment #1 from Marc Glisse  ---
I already see some "__glibcxx_assert(this->_M_is_engaged());" in the code,
which IIUC should be enabled by _GLIBCXX_ASSERTIONS (and a fortiori by
_GLIBCXX_DEBUG). Did you actually try it?

[Bug c++/101651] New: constexpr write to simd vector element

2021-07-27 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101651

Bug ID: 101651
   Summary: constexpr write to simd vector element
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

(adapted from https://stackoverflow.com/q/68517921/1918193)

#ifdef WORK
 #include 
 typedef std::array vec;
#else
 typedef char vec __attribute__((vector_size(16)));
#endif
constexpr auto gen () {
vec ret{};
for (int i = 0; i < sizeof(vec); ++i) {
ret[i] = 2;
}
return ret;
};
constexpr auto m = gen();


c.cc:9:23:   in 'constexpr' expansion of 'gen()'
c.cc:9:24: error: modification of '(char [16])ret' is not a constant expression
9 | constexpr auto m = gen();
  |^

However, with -DWORK to use std::array instead of the vector extension, it
compiles just fine, so there shouldn't be any strong obstacle to implement
this.

[Bug tree-optimization/101639] New: vectorization with bool reduction

2021-07-27 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639

Bug ID: 101639
   Summary: vectorization with bool reduction
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

bool f(char* p, long n)
{
  bool r = true;
  for(long i = 0; i < n; ++i)
r &= (p[i] != 0);
  return r;
}

is not vectorized, while if I simply declare r as char instead of bool, it is
(not quite optimal since it fails to pull &1 out of the loop, but that's a
separate issue).

[Bug c++/91099] constexpr vs -frounding-math

2021-07-27 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91099

Marc Glisse  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Marc Glisse  ---
Interestingly, Jakub's patch has the same issue my patch here had: "defeated by
constexpr-caching if we change flag_rounding_math in the middle of a
translation unit".

constexpr double f(){return 1./3.;}
#if BUG
__attribute__((optimize("no-rounding-math")))
double g(){
  double d=f();
  return d;
}
#endif
__attribute__((optimize("rounding-math")))
double h(){
  double d=f();
  return d;
}

The presence of g changes the code we generate for h. At least we don't seem to
reuse the cache from a different value of manifestly_const_eval, so maybe
changing rounding_math is just not supported and this goes in the list of
issues with attribute optimize.

[Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily

2021-07-26 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #7 from Marc Glisse  ---
The same strategy to implement arithmetic shift in terms of logical shift works
not just for vector>>vector but also vector>>scalar and scalar>>scalar. But it
is probably not worth the trouble indeed, especially since your target patch is
ready :-)

[Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily

2021-07-26 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #5 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #2)
> for arithmetic V[24]DImode >> V[24]DImode
> logical ((x >> y) ^ (0x8000ULL >> y)) - (0x8000ULL
> >> y)
> can be used.

I guess it would be complicated to try and implement this fallback strategy in
a generic way so other modes/targets could benefit.

[Bug middle-end/56873] vector shift lowered to scalars

2021-07-24 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56873

Marc Glisse  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Marc Glisse  ---
Indeed, I now get sensible code with -mxop. Not so with -mavx2, but that seems
independent and I filed it as PR 101611.

[Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily

2021-07-24 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

Bug ID: 101611
   Summary: AVX2 vector arithmetic shift lowered to scalar
unnecessarily
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

Stealing the example from PR 56873

#define SIZE 32
typedef long long veci __attribute__((vector_size(SIZE)));

veci f(veci a, veci b){
  return a>>b;
}

but compiling with -O3 -mavx2 this time, gcc produces scalar code

vmovq   %xmm1, %rcx
vmovq   %xmm0, %rax
vpextrq $1, %xmm0, %rsi
sarq%cl, %rax
vextracti128$0x1, %ymm0, %xmm0
vpextrq $1, %xmm1, %rcx
vextracti128$0x1, %ymm1, %xmm1
movq%rax, %rdx
sarq%cl, %rsi
vmovq   %xmm0, %rax
vmovq   %xmm1, %rcx
vmovq   %rdx, %xmm5
sarq%cl, %rax
vpextrq $1, %xmm1, %rcx
movq%rax, %rdi
vpextrq $1, %xmm0, %rax
vpinsrq $1, %rsi, %xmm5, %xmm0
sarq%cl, %rax
vmovq   %rdi, %xmm4
vpinsrq $1, %rax, %xmm4, %xmm1
vinserti128 $0x1, %xmm1, %ymm0, %ymm0
ret

while clang outputs much shorter vector code

vpbroadcastq.LCPI0_0(%rip), %ymm2   # ymm2 =
[9223372036854775808,9223372036854775808,9223372036854775808,9223372036854775808]
vpsrlvq %ymm1, %ymm2, %ymm2
vpsrlvq %ymm1, %ymm0, %ymm0
vpxor   %ymm2, %ymm0, %ymm0
vpsubq  %ymm2, %ymm0, %ymm0
retq

[Bug libstdc++/58909] C++11's condition variables fail with static linking

2021-07-22 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58909

Marc Glisse  changed:

   What|Removed |Added

 CC||glisse at gcc dot gnu.org

--- Comment #27 from Marc Glisse  ---
(In reply to Jonathan Wakely from comment #26)
> If you create a new thread of execution then you'll get a non-weak reference
> to pthread_create, which should cause libpthread.so to be linked even with
> -Wl,--as-needed (and for static linking it will work if libpthread.a has a
> single .o with all symbols).
> 
> If you don't actually have multiple threads in your program, then things
> like condition_variable and once_flag can end up using the stubs in
> libc.so.6 which are no-ops. But since you don't have multiple threads, it's
> probably not a major problem.

For call_once, it throws an exception whether there are other threads or not,
it isn't a no-op.
(as you might guess, this code is in a library, I don't control if threads are
used elsewhere)

> Most uses of std::once_flag would be better
> done with a local static variable anyway (the exception being non-static
> data members of classes).

I build trees with a once_flag in each node, there is no way I can do that with
static variables.

> With glibc 2.34 the problem goes away, so I'm not sure it's worth investing
> much effort in libstdc++ trying to work around the problems with weak
> symbols.

Ok. I just wanted to advertise that the issue is not limited to static linking.

(too bad you had to revert the new call_once implementation)

[Bug libstdc++/58909] C++11's condition variables fail with static linking

2021-07-22 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58909

--- Comment #25 from Marc Glisse  ---
Note that this also affects dynamic linking with -Wl,--as-needed (which some
platforms use by default).

#include 
int main(){
  std::once_flag o;
  std::call_once(o, [](){});
}

$ g++ b.cc -lpthread && ldd ./a.out
linux-vdso.so.1 (0x7ffca7b6)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6
(0x7f9c809ac000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f9c807e7000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f9c806a3000)
/lib64/ld-linux-x86-64.so.2 (0x7f9c80bd4000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
(0x7f9c80689000)

No libpthread there :-(

(using -pthread instead of -lpthread works, but some build systems like cmake
use -lpthread by default)

[Bug bootstrap/49908] -lm missing after -lmpc

2021-07-19 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49908

--- Comment #5 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #4)
> GCC builds now with the c++ which means this won't show up.

Just because g++ has an implicit -lm doesn't mean that any random 3rd-party C++
compiler does too.
(I don't really care about this PR though, I don't mind it being closed)

[Bug tree-optimization/101501] [11/12 Regression] wrong code at -O3 on x86_64-linux-gnu

2021-07-18 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101501

--- Comment #2 from Marc Glisse  ---
unsigned char a = 55;
int main() {
  unsigned char c;
d:
  c = a-- * 52;
  if (c)
goto d;
  __builtin_printf("%d\n", a);
}


outputs 40 at -O3 instead of 255, and already fails with gcc-8. Cunroll seems
confused about the number of iterations of this loop.

[Bug middle-end/101063] #pragma STDC FENV_ACCESS ON: wrong code generation: instructions leading to side effects may not be generated

2021-06-14 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101063

--- Comment #1 from Marc Glisse  ---
> Note 1: Under -Wall gcc generates warning:
> :5: warning: ignoring '#pragma STDC FENV_ACCESS' [-Wunknown-pragmas]

That seems like a huge hint, this is not implemented in gcc. You can find
several existing PR in this bugzilla.

There is a branch refs/users/glisse/heads/fenv that was kind of functional last
time I tried, but I'll never have time to polish it.

[Bug middle-end/54400] recognize vector reductions

2021-06-08 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400

--- Comment #8 from Marc Glisse  ---
(In reply to Richard Biener from comment #7)
> (note avoiding hadd in the reduc pattern was intended).

Indeed. Except with -Os, or if a processor with a fast hadd appears,
vectorising this doesn't bring anything. It doesn't hurt either though.

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2021-06-07 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #4 from Marc Glisse  ---
(In reply to Denis Yaroshevskiy from comment #3)
> Is what @Andrew Pinski copied enough?

I think so (it is missing the command line), although one example with an
integer type could also help in case floats turn out to have a different issue.

> -ftrapping-math causes clang to stop doing this optimisation.

Note that -ftrapping-math is on by default with gcc (PR 54192), but
-fno-trapping-math wouldn't solve your problem, we are missing other things.

[Bug rtl-optimization/95405] Unnecessary stores with std::optional

2021-06-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95405

--- Comment #5 from Marc Glisse  ---
GIMPLE doesn't know about calling conventions, that's something that only
"appears" during expansion to RTL.
Still, I don't claim to understand what is going on here.

[Bug rtl-optimization/95405] Unnecessary stores with std::optional

2021-06-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95405

--- Comment #3 from Marc Glisse  ---
For a self-contained version, see below. Notice how the extra constructor in
_Optional_payload_base changes the generated code, or storing directly a
_Optional_payload_base instead of _Optional_payload in optional

struct _Optional_payload_base {
  long _M_value;
  bool _M_engaged = false;
  _Optional_payload_base() = default;
  ~_Optional_payload_base() = default;
  _Optional_payload_base(const _Optional_payload_base&) = default;
  _Optional_payload_base(_Optional_payload_base&&) = default;

  _Optional_payload_base(double,float);
};

struct _Optional_payload : _Optional_payload_base { };

struct optional
{
  _Optional_payload _M_payload;
};

optional foo();
long bar()
{
  auto r = foo();
  if (r._M_payload._M_engaged)
return r._M_payload._M_value;
  else
return 0L;
}

[Bug target/100929] gcc fails to optimize less to min for SIMD code

2021-06-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

Marc Glisse  changed:

   What|Removed |Added

Version|og10 (devel/omp/gcc-10) |11.1.0
   Keywords||missed-optimization
  Component|c++ |target
   Severity|normal  |enhancement
 Target||x86_64-*-*

[Bug c++/100929] gcc fails to optimize less to min for SIMD code

2021-06-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #1 from Marc Glisse  ---
Please attach your testcases to the bug report. godbolt links are nice
complements, but not considered sufficient here.

We don't lower the comparison or the blend in GIMPLE (yet). I think Hongtao Liu
is doing blends right now. I don't know if there would be issues for
comparisons (with -ftrapping-math for instance?).

If you write (x

[Bug target/100784] ICE: Segmentation fault, contains_struct_check(tree_node*, tree_node_structure_enum, char const*, int, char const*)

2021-05-27 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100784

--- Comment #2 from Marc Glisse  ---
Do we need to punt if there is no lhs?
(with optimization the call should be removed as pure)
I probably won't have time to try it for a while.

[Bug c++/63164] unnecessary calls to __dynamic_cast

2021-05-26 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63164

Marc Glisse  changed:

   What|Removed |Added

   Last reconfirmed||2021-05-26
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Marc Glisse  ---
I was going to file exactly the same RFE for dynamic_cast and final types
(preferably it should also work if 'final' is only detected by LTO, but that
shouldn't block an easier front-end patch), so confirmed.

[Bug c++/100746] NRVO should not introduce aliasing

2021-05-24 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100746

--- Comment #1 from Marc Glisse  ---
PR 80740 ?

[Bug tree-optimization/100366] spurious warning - std::vector::clear followed by std::vector::insert(vec.end(), ...) with -O2

2021-05-05 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100366

--- Comment #7 from Marc Glisse  ---
It seems to help if we save the values before the allocation in vector.tcc,
although I cannot promise it won't pessimize something else... And that's just
a workaround, not a solution.

@@ -766,13 +766,16 @@
  {
const size_type __len =
  _M_check_len(__n, "vector::_M_range_insert");
+   pointer __old_start(this->_M_impl._M_start);
+   pointer __old_finish(this->_M_impl._M_finish);
+   pointer __old_end_of_storage(this->_M_impl._M_end_of_storage);
pointer __new_start(this->_M_allocate(__len));
pointer __new_finish(__new_start);
__try
  {
__new_finish
  = std::__uninitialized_move_if_noexcept_a
- (this->_M_impl._M_start, __position.base(),
+ (__old_start, __position.base(),
   __new_start, _M_get_Tp_allocator());
__new_finish
  = std::__uninitialized_copy_a(__first, __last,
@@ -780,7 +783,7 @@
_M_get_Tp_allocator());
__new_finish
  = std::__uninitialized_move_if_noexcept_a
- (__position.base(), this->_M_impl._M_finish,
+ (__position.base(), __old_finish,
   __new_finish, _M_get_Tp_allocator());
  }
__catch(...)
@@ -790,12 +793,12 @@
_M_deallocate(__new_start, __len);
__throw_exception_again;
  }
-   std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
+   std::_Destroy(__old_start, __old_finish,
  _M_get_Tp_allocator());
_GLIBCXX_ASAN_ANNOTATE_REINIT;
-   _M_deallocate(this->_M_impl._M_start,
- this->_M_impl._M_end_of_storage
- - this->_M_impl._M_start);
+   _M_deallocate(__old_start,
+ __old_end_of_storage
+ - __old_start);
this->_M_impl._M_start = __new_start;
this->_M_impl._M_finish = __new_finish;
this->_M_impl._M_end_of_storage = __new_start + __len;

[Bug tree-optimization/100366] spurious warning - std::vector::clear followed by std::vector::insert(vec.end(), ...) with -O2

2021-05-05 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100366

--- Comment #6 from Marc Glisse  ---
So, apart from the small missed PHI optimization, this is probably the common
issue that since operator new is replacable, we can't really assume that it
does not clobber anything, and that hurts optimizations :-(
Not sure if there would be any convenient workaround for this specific case.

[Bug tree-optimization/100366] spurious warning - std::vector::clear followed by std::vector::insert(vec.end(), ...) with -O2

2021-05-05 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100366

--- Comment #5 from Marc Glisse  ---
(In reply to Martin Sebor from comment #2)
> The IL looks like the warning is justified:

The memcpy call is dead code, we just fail to notice it.

>[local count: 230225493]:
>   # prephitmp_42 = PHI <_6(4), _7(3)>

This is always _6, because in bb 3 we have _6 == _7.

>   pretmp_67 = vec_2(D)->D.33449._M_impl.D.32762._M_start;
>   _69 = prephitmp_42 - pretmp_67;

Always 0.

>[local count: 220460391]:
>   MEM  [(char * {ref-all})_155] = pretmp_72;
>   _50 = vec_2(D)->D.33449._M_impl.D.32762._M_finish;
>   _Num_51 = _50 - prephitmp_42;

Always 0, in bb 4 we copy _M_start in _M_finish if they are not already equal.

(sorry for the wrong FRE comment earlier)

Note that if I replace operator new/delete with malloc/free

inline void* operator new(std::size_t n){return __builtin_malloc(n);}
inline void operator delete(void*p)noexcept{__builtin_free(p);}
inline void operator delete(void*p,std::size_t)noexcept{__builtin_free(p);}

we optimize quite a bit more and the warning disappears.

[Bug tree-optimization/100366] spurious warning - std::vector::clear followed by std::vector::insert(vec.end(), ...) with -O2

2021-05-02 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100366

Marc Glisse  changed:

   What|Removed |Added

   Last reconfirmed||2021-05-02
 Ever confirmed|0   |1
  Component|c++ |tree-optimization
   Keywords||diagnostic,
   ||missed-optimization
 Status|UNCONFIRMED |NEW

--- Comment #1 from Marc Glisse  ---
Assuming the warning happens during the strlen pass, we are still missing a lot
of optimizations at that point

  if (_6 != _7)
goto ; [70.00%]
  else
goto ; [30.00%]

   [local count: 322122544]:
  _158 = _7 - _6;

once VRP2 (2 passes after strlen) replaces _158 with 0 and propagates it, maybe
the code becomes nice enough to avoid confusing this fragile warning (I didn't
check).

Before FRE3, we have

  _6 = vec_2(D)->D.33506._M_impl.D.32819._M_start;
  _7 = vec_2(D)->D.33506._M_impl.D.32819._M_finish;
  if (_6 != _7)
goto ; [70.00%]
  else
goto ; [30.00%]

   [local count: 1073741824]:
  _5 = MEM[(char * const &)vec_2(D) + 8];
  MEM[(struct __normal_iterator *)] ={v} {CLOBBER};
  MEM[(struct __normal_iterator *)]._M_current = _5;
  __position = D.33862;
  _12 = MEM[(const char * const &)vec_2(D)];
  _13 = MEM[(const char * const &)&__position];
  _14 = _13 - _12;

and after FRE3

   [local count: 1073741824]:
  _5 = MEM[(char * const &)vec_2(D) + 8];
  MEM[(struct __normal_iterator *)] ={v} {CLOBBER};
  MEM[(struct __normal_iterator *)]._M_current = _5;
  __position = D.33862;
  _14 = _5 - _6;

Only PRE manages to notice that _5 is the same as _7, which is already late.
And it then takes until VRP2 to realize that _7 - _6 must be 0 in the else
branch of _6 != _7.

* I am not sure why FRE manages to optimize _12 and not _5, that seems like the
first thing to check (maybe the +8 means it is obviously "partial")
* I don't know if some other pass than VRP could learn that b-a is 0 if not
a!=b.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #12 from Marc Glisse  ---
(In reply to rguent...@suse.de from comment #11)

> For PR7
> I have prototyped a forwprop patch to try constant folding
> stmts with all-constant PHIs, thus in this case c$_M_value_2 > 0,
> when there's only a single use of it

Maybe we could handle any case where trying to fold the single use (counting
x*x as a single use of x) with each possible value satisfies is_gimple_val (or
whatever the condition is to be allowed in a PHI, and without introducing a use
of a ssa_name before it is defined), so that things like PHI & X would
simplify. But the constant case is indeed the most important, and should allow
the optimization in this PR before the vectorizer using reassoc1.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #8 from Marc Glisse  ---
PR96480 would be my guess.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #7 from Marc Glisse  ---
Some key steps in the optimization:
PRE turns PHI<-1,0,1> > 0 into PHI<0,0,1>
reassoc then combines the operations (it didn't in gcc-10)
forwprop+phiopt cleans up (i>0)!=0?1:0 into just i>0.

Having to wait until phiopt4 to get the simplified form is still very long, and
most likely causes missed optimizations in earlier passes. But nice progress!

[Bug c++/100322] Switching from std=c++17 to std=c++20 causes performance regression in relationals

2021-04-28 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100322

--- Comment #7 from Marc Glisse  ---
PR94589 then.

[Bug tree-optimization/99046] New: [[gnu::const]] function needs noexcept to be recognized as loop invariant

2021-02-09 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99046

Bug ID: 99046
   Summary: [[gnu::const]] function needs noexcept to be
recognized as loop invariant
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

(from https://stackoverflow.com/q/66100945/1918193)

double x[1000] = {};
[[gnu::const]] double* g(double* var);
void f() {
for (int i = 1; i < 1000; i++) {
g(x)[i] = (g(x)[i-1] + 1.0) * 1.001;
}
}

g++ -O3 eliminates half of the calls to g, but fails to move it to a single
call before the loop, while llvm does just that. Gcc does manage it if I mark f
as noexcept or nothrow. Whether const functions may throw seems debatable, but
if they do throw, I expect them to do so consistently, and since the loop has
at least one iteration and starts with this call, the transformation seems safe
to me.

[Bug target/98962] New: Perform bitops on floats directly with SSE

2021-02-03 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98962

Bug ID: 98962
   Summary: Perform bitops on floats directly with SSE
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

(from https://stackoverflow.com/q/66023408/1918193 )

float f(float a){
  unsigned ai;
  __builtin_memcpy(, , 4);
  unsigned ri = ai ^ (1U << 31);
  float r;
  __builtin_memcpy(, , 4);
  return r;
}

results in

movd%xmm0, %eax
addl$-2147483648, %eax
movd%eax, %xmm0

while llvm simplifies it to

xorps   .LCPI0_0(%rip), %xmm0

[Bug tree-optimization/60770] disappearing clobbers

2021-01-27 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60770

--- Comment #14 from Marc Glisse  ---
(In reply to Orgad Shaneh from comment #13)
> The case described in comment 1 doesn't issue a warning with GCC 10.

It does for me with -Wall -O (you need at least some optimization). If there is
still a problem, you need to open a new issue.

[Bug middle-end/98709] gcc optimizes bitwise operations, but doesn't optimize logical ones

2021-01-17 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98709

--- Comment #1 from Marc Glisse  ---
At the end of gimple, we have
  _6 = a_3(D) ^ b_4(D);
  _1 = ~_6;
  _2 = a_3(D) == b_4(D);
  _7 = _1 & _2;
I guess we are missing a simplification of ~(a^b) to a==b for bool (similar to
~(a!=b) be we canonicalize != to ^).

[Bug d/98607] GDC merging computations but rounding mode has changed

2021-01-15 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98607

--- Comment #9 from Marc Glisse  ---
Since I doubt gdc handles rounding modes correctly for scalars, I think you can
ignore this issue in the implementation of the vector intrinsics for now (same
as we do in C and C++).

Note that gcc isn't alone here, llvm doesn't implement pragma fenv_access
either, and even visual studio, which does implement it for scalars, fails for
vectors. I did not test with Intel's compiler.

[Bug target/98698] New: atomic load to FPU registers

2021-01-15 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98698

Bug ID: 98698
   Summary: atomic load to FPU registers
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

#include 
std::atomic a;
double f(){ return a.load(std::memory_order_relaxed); }

is compiled by g++ to

movqa(%rip), %rax
movq%rax, %xmm0
ret

As far as I understand, a direct movsd to xmm0 would still be atomic, and
that's indeed what llvm outputs.

[Bug c++/98556] [8/9/10/11 Regression] ICE: 'verify_gimple' failed since r8-4821-g1af4ebf5985ef2aa

2021-01-06 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98556

--- Comment #4 from Marc Glisse  ---
The result of the subtraction is supposed to be an integer type, and is instead
an enum based on that underlying type? Maybe the verification code needs
tweaking to allow that.

[Bug target/98167] [x86] Failure to optimize operation on indentically shuffled operands into a shuffle of the result of the operation

2020-12-08 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167

--- Comment #8 from Marc Glisse  ---
(In reply to Richard Biener from comment #4)
> We already handle IX86_BUILTIN_SHUFPD there but not IX86_BUILTIN_SHUFPS for
> some reason.

https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html
I was checking with just one builtin if this was the right approach, and never
got to extend it to others, sorry. Handling shufps in a similar way seems good
to me, if anyone has time to do it.

[Bug sanitizer/97868] New: warn about using fences with TSAN

2020-11-16 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97868

Bug ID: 97868
   Summary: warn about using fences with TSAN
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

The thread sanitizer (-fsanitize=thread) does not handle C++
atomic_thread_fence. This is barely acknowledged in the documentation, but
causes a number of users to waste a lot of time trying to understand how the
reported race could occur. Ideally, it would be supported, but that seems hard.
What does seem doable is adding a warning for programs using fences with tsan.
I don't really care if it is a compile-time warning, or a runtime warning, and
in the second case whether it appears as soon as a fence is executed or only as
a note after reported races, as long as I get a hint about it.

[Bug tree-optimization/97085] [11 Regression] aarch64, SVE: ICE in gimple_expand_vec_cond_expr since r11-2610-ga1ee6d507b

2020-09-24 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97085

--- Comment #6 from Marc Glisse  ---
(In reply to Richard Biener from comment #5)
> (In reply to Marc Glisse from comment #4)
> > I would be happy with a revert of that patch, if the ARM backend gets fixed,
> > but indeed a missed optimization should not cause an ICE.
> 
> Not sure what the ARM backend issue is.

PR 96528

> Well, VEC_COND_EXPR (as some other VEC_ tree codes) are special in that
> we are (try to...) be careful to only synthesize ones supported "directly"
> by the target.

After vector lowering, yes. But before that, the front-end can produce
vec_cond_expr for vector types that are not supported. Ah, you probably meant
synthesize them from optimization passes, ok.

> For the mask vectors (VECTOR_BOOLEAN_TYPE_P, in the
> AVX512/SVE case) I don't think the targets support ?: natively but they
> have bitwise instructions for this case.  That means we could 'simply'
> expand mask x ? y : z as (y & x) | (z & ~x) I guess [requires x and y,z
> to be of the same type of course].  I wondered whether we ever
> need to translate between, say, vector and vector
> where lowering ?: this way would require '[un]packing' one of the vectors.

I still need to go back to the introduction of those types to understand why
vector exists at all...

> True, unless you go to bitwise ops.  For scalar flag ? bool_a : bool_b
> ?: isn't the natural representation either - at least I'm not aware
> of any pattern transforming (a & b) | (c & ~b) to b ? a : c for
> precision one integer types ;)

There are PRs asking for this transformation (and for transformations that this
one would enable).

[Bug tree-optimization/97085] [11 Regression] aarch64, SVE: ICE in gimple_expand_vec_cond_expr since r11-2610-ga1ee6d507b

2020-09-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97085

--- Comment #4 from Marc Glisse  ---
I would be happy with a revert of that patch, if the ARM backend gets fixed,
but indeed a missed optimization should not cause an ICE.

(In reply to Richard Biener from comment #2)
> At least we're not at all expecting to have a VEC_COND_EXPR where
> the comparison feeding the mask has different operand modes than the
> VEC_COND_EXPR result mode.

Ah, I see why that might cause trouble, although I think supporting this makes
sense, when the modes have the same size or when we use AVX512-style bool
vectors.

> We'd also want to add verification if we do not want
> VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPR.

I understand that a VEC_COND_EXPR which outputs a vector of bool (à la AVX512)
is a different thing from a VEC_COND_EXPR which outputs a "true" vector, but
VEC_COND_EXPR still looks like the right tree code to represent both.

(In reply to Richard Biener from comment #3)
> +/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> +   types are compatible.  */
> +(simplify
> + (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2)
> + (if (VECTOR_BOOLEAN_TYPE_P (type)
> +  && types_match (type, TREE_TYPE (@0)))
> +  (if (integer_zerop (@1) && integer_all_onesp (@2))
> +   (bit_not @0)
> +   (if (integer_all_onesp (@1) && integer_zerop (@2))
> +@0

Is the test VECTOR_BOOLEAN_TYPE_P necessary?

(sorry, I may not be very reactive these days)

[Bug tree-optimization/96938] Failure to optimize bit-setting pattern when not using temporary

2020-09-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96938

--- Comment #1 from Marc Glisse  ---
With "char tmp" instead of "int tmp", we get the same code as the first
function.

[Bug target/96918] Failure to optimize vector shift left+shift right+or to pshuf

2020-09-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918

--- Comment #5 from Marc Glisse  ---
typedef unsigned short v8i16 __attribute__((vector_size(16)));

v8i16 bswap_epi16(v8i16 x)
{
return (x << 8) | (x >> 8);
}

We do recognize a rotate already in GENERIC

  return x r<< 8;

But this is expanded to

movdqa  %xmm0, %xmm1
psrlw   $8, %xmm0
psllw   $8, %xmm1
por %xmm1, %xmm0

probably the target could advertise a rotate insn for that mode, restricted to
an argument of 8?

IIRC, I didn't use vector extensions for the corresponding shift intrinsics
because for large shift amounts they set the result to 0. But for a constant
scalar, we could lower the builtin to a shift (or fold to 0).

[Bug tree-optimization/96912] Failure to optimize pblendvb pattern

2020-09-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96912

--- Comment #2 from Marc Glisse  ---
With consistent types, we recognize a VEC_COND_EXPR. With inconsistent types, I
guess we would need to reinterpret x and y as v16i8, and reinterpret the result
back to v2i64.

(please keep #include  in your testcases so we can just copy-paste
and compile them, or use long long instead of int64_t)

[Bug tree-optimization/96897] Failure to optimize not+not+dec+and+not to add+or

2020-09-02 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96897

--- Comment #1 from Marc Glisse  ---
We already transform to
return ~(-2 - x) | x;

so this is really asking for
~(-2 - x) --> x + 1

[Bug tree-optimization/92712] [8/9 Regression] Performance regression with assumed values

2020-09-02 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92712

--- Comment #25 from Marc Glisse  ---
(In reply to Feng Xue from comment #24)
> Another point: if B+-C can be folded to an existing gimple value, we might
> deduce B+-C does not overflow?

We can deduce that loading this value that represents B+-C does not overflow at
runtime (since we aren't computing anything, just copying some other value),
but not that the operation B+-C would not overflow if it was actually
evaluated. So it could help a bit sometimes. We still need to ensure that the
multiplication by A does not overflow though.

[Bug c++/96862] -frounding-math -std=c++2a error: '(1.29e+2 * 6.9314718055994529e-1)' is not a constant expression

2020-08-31 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96862

--- Comment #8 from Marc Glisse  ---
Should we handle flag_trapping_math at the same time?

[Bug c++/96862] -frounding-math -std=c++2a error: '(1.29e+2 * 6.9314718055994529e-1)' is not a constant expression

2020-08-31 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96862

--- Comment #5 from Marc Glisse  ---
"[Note: This document does not require an implementation to support the 
FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma 
is supported. As a consequence, it is implementation-defined whether 
these functions can be used to test floating-point status flags, set 
floating-point control modes, or run under non-default mode settings. If 
the pragma is used to enable control over the floating-point environment, 
this document does not specify the effect on floating-point evaluation in 
constant expressions. — end note]"

So the C++ standard lets us choose what we want gcc to do in this case. The C
standard is of course more precise, but using its own definition of constant
expressions

http://port70.net/~nsz/c/c11/n1570.html#F.8.4
"1 An arithmetic constant expression of floating type, other than one in an
initializer for an object that has static or thread storage duration, is
evaluated (as if) during execution; thus, it is affected by any operative
floating-point control modes and raises floating-point exceptions as required
by IEC 60559 (provided the state for the FENV_ACCESS pragma is ''on'').366)

2 EXAMPLE

  #include 
  #pragma STDC FENV_ACCESS ON
  void f(void)
  {
float w[] = { 0.0/0.0 };  //   raises an
exception
static float x = 0.0/0.0; //   does not raise
an exception
float y = 0.0/0.0;//   raises an
exception
double z = 0.0/0.0;   //   raises an
exception
/* ... */
  }

3 For the static initialization, the division is done at translation time,
raising no (execution-time) floating- point exceptions. On the other hand, for
the three automatic initializations the invalid division occurs at execution
time."

So Jakub's proposition makes sense, fold inexact operations when we have to
(and use default (nearest) rounding in that case, as long as we don't have
pragma fenv_round). Initializing a global (before main starts) also looks like
a place where folding could make sense, although it is less important.

[Bug target/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #14 from Marc Glisse  ---
(In reply to Marc Glisse from comment #13)
> if (HONOR_SIGNED_ZEROS (mode))
>   x2 = copysign (x2, x);

Hmm, I misread the comment, sorry. We already do that, for both floor and ceil.
But we don't use a true copysign, we use ix86_sse_copysign_to_positive which
won't be able to change the sign from - to +. Just changing it to a true
copysign (one extra and or andn) should be enough then?

[Bug target/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #13 from Marc Glisse  ---
x-x does depend on the rounding mode (the transformation in match.pd gets it
wrong, by the way).
If the sign of 0 is the only issue, maybe we can test flag_rounding_math &&
flag_signed_zeros or the corresponding HONOR_*(mode)? There are sensible cases
where rounding matters but not the sign of 0.
As for making the sequence always work... I am not sure there is much better
than if(x2==0)x2=0;. We could also compute -1 in type long (the test isless
should already guarantee that there is no overflow?), that means an extra
conversion from long to double. I see that ix86_expand_floorceildf_32 already
ends with

if (HONOR_SIGNED_ZEROS (mode))
  x2 = copysign (x2, x);

so we could also add that to ix86_expand_floorceil.

  1   2   3   4   5   6   7   8   9   10   >