[Bug c++/63707] Brace initialization of array sometimes fails if no copy constructor

2020-12-06 Thread eyalroz at technion dot ac.il via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63707

Eyal Rozenberg  changed:

   What|Removed |Added

 CC||eyalroz at technion dot ac.il

--- Comment #12 from Eyal Rozenberg  ---
Rejection valid code, especially valid code which is not contrived and can well
occur in people's real-life work, seems like a high-priority bug.

The last substantive comment here, other than dupe-marking-related comments two
years ago, is comment #8, asking for this to be fixed - four and a half years
ago.

Jonathan and others - please try to prioritize fixing this, and even if you
can't for some reason - at least explain which this can't be fixed promptly.

See also:

https://stackoverflow.com/q/65138048/1593077

[Bug c++/97553] [missed optimization] constexprness not noticed when UBsan enabled

2020-10-26 Thread eyalroz at technion dot ac.il via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97553

--- Comment #5 from Eyal Rozenberg  ---
(In reply to Jakub Jelinek from comment #4)
> Depends on what you mean by properly.  -O3 can be used with sanitization,
> but expecting the code to be optimized the same way as without sanitization
> is wrong, it is more important to catch as many bugs as possible, and the
> runtime instrumentation slows things down anyway.  The sanitization is not
> meant to be used for production code, only when debugging it.

I wonder, then, if some kind of notice isn't called for when -O3 and UBsan are
used together.

[Bug c++/97553] [missed optimization] constexprness not noticed when UBsan enabled

2020-10-26 Thread eyalroz at technion dot ac.il via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97553

--- Comment #3 from Eyal Rozenberg  ---
> And, the runtime sanitization intentionally isn't heavily optimized away, 
> because the intent is to detect when the code is invalid, so it can't e.g. 
> optimize away those checks based on assumption that undefined behavior will 
> not happen.

So, doesn't that essentially mean that -O3 cannot properly apply with UBsan?

[Bug c++/97553] New: [missed optimization] constexprness not noticed when UBsan enabled

2020-10-23 Thread eyalroz at technion dot ac.il via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97553

Bug ID: 97553
   Summary: [missed optimization] constexprness not noticed when
UBsan enabled
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

(GodBolt example: https://godbolt.org/z/Kvan5c)

Consider the following code:

  #include 

  constexpr std::string_view f() { return "hello"; }

  static constexpr std::string_view g() {
  auto x { f() };
  return x.substr(1, 3);
  } 

  int foo() { return g().length(); }

if you compile it with flags `--std=c++17 -O3`, it results in a pleasant:

  foo():
  mov eax, 3
  ret

but if you also enabled undefined-behavior sanitization, i.e. `--std=c++17
-fsanitize=undefined -O3`, then you get a much longer program with UB-related
instrumentation - which is never used.

I'm not sure if it's because some optimizations are disabled with UBsan, in
which case this might be a "misfeature", or whether they're enabled but the
optimization is just missed.

[Bug c/97274] Need ability to ensure no warning about tmpnam

2020-10-02 Thread eyalroz at technion dot ac.il via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97274

--- Comment #2 from Eyal Rozenberg  ---
(In reply to Jonathan Wakely from comment #1)
> The linker issues the warning, because the symbol in glibc is annotated to
> cause a warning. It has nothing to do with GCC.

Hmm. There's still a question of responsibility:

* Supposing at least some part of GCC is aware that a symbol used is annotated
in the library to cause a warning - should it not offer some mechanism for
circumventing that warning? Seeing how it's a "legitimate" standard library
function?
* Otherwise, would this be a bug to file against the linker, or against the
library?

[Bug c/97274] New: Need ability to ensure no warning about tmpnam

2020-10-02 Thread eyalroz at technion dot ac.il via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97274

Bug ID: 97274
   Summary: Need ability to ensure no warning about tmpnam
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

If you use tmpnam, or std::tmpnam in C++, you get a linker (not compiler,
linker) warning:

/usr/bin/ld:
CMakeFiles/simpleDrvRuntimePTX.dir/modified_cuda_samples/simpleDrvRuntimePTX/simpleDrvRuntimePTX.cpp.o:
in function `create_ptx_file[abi:cxx11]()':
/home/eyalroz/src/mine/cuda-api-wrappers/examples/modified_cuda_samples/simpleDrvRuntimePTX/simpleDrvRuntimePTX.cpp:105:
warning: the use of `tmpnam' is dangerous, better use `mkstemp'

there should be a way to disable that warning, when invoking the compiler
without separate linking, or simply when invoking the linker. I'm not sure if
this bug should even be filed here, since it's not obvious to me who is
"responsible" for the linker emitting this error.

[Bug c++/96283] "undefined vtable" error should indicate which members are missing

2020-07-22 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96283

--- Comment #2 from Eyal Rozenberg  ---
(In reply to Andrew Pinski from comment #1)
> (In reply to Eyal Rozenberg from comment #0)
> > I'm assuming the compiler provides the linker with enough information to
> > realize which virtual methods' implementations are missing
> 
> It does not.  

Ok, still - the linker knows which virtual methods it needs, and it knows which
are provided by each compiled translation unit. Isn't that enough?


> Also techinically the C++ ABI can really be changed.

I don't understand how this sentence relates to the previous part of your
reply. If you mean - can change between compilation and linking - that's
theoretically possible but would be the cause of all sorts of trouble.

[Bug c++/96283] New: "undefined vtable" error should indicate which members are missing

2020-07-22 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96283

Bug ID: 96283
   Summary: "undefined vtable" error should indicate which members
are missing
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Consider the following code:

class Base {
public:
virtual void vmethod();
};

class foo : public Base {
int x;
void vmethod() override;
};

int main() {
foo f;
}

This will yield the errors (irrelevant paths snipped):

ld: prog.o: in function `Base::Base()':
:1: undefined reference to `vtable for Base'
ld: prog.o: in function `foo::foo()':
:6: undefined reference to `vtable for foo'

While this is true, it is a bit confusing. But even supposing I looked up what
this error means and realized what was going on, I would still need to go over
all the methods of one or two of the classes to find the one that's missing its
implementation. In this simple example that's not so difficult, but sometimes
it's quite the nuisance.

I'm assuming the compiler provides the linker with enough information to
realize which virtual methods' implementations are missing, so that the linker
can finally print an error message which methods are still missing after it has
run.

In this specific case, the linker should complain about vmethod() missing its
definition.

GodBolt: https://godbolt.org/z/9Ejn4s

[Bug c++/95148] New: -Wtype-limits always-false warning triggered despite comparison being avoided

2020-05-15 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95148

Bug ID: 95148
   Summary: -Wtype-limits always-false warning triggered despite
comparison being avoided
   Product: gcc
   Version: 10.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Consider the following program:

  #include 

  int main() {
  unsigned x { 5 };
  return (std::is_signed::value and (x < 0)) ? 1 : 0;
  }

when compiling it with GCC versions 11.0 20200511, 10.1, 9.2.1, 8.3.0, I get
the warning:

  a.cpp:5:52: warning: comparison of unsigned expression < 0 is always false
[-Wtype-limits]

I should not be getting this warning, because when x is unsigned, the
comparison is never performed, due to the short-circuit semantics of `and`.
This can be easily determined by the compiler - and probably is. No less
importantly, the author of such a line in a program clearly specified his/her
intent here with this check. 

clang doesn't seem to issue a warn inf does come to pass.

[Bug libstdc++/94559] Nitpick: std::array constexpr_fill test isn't constexpr

2020-04-11 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94559

Eyal Rozenberg  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Eyal Rozenberg  ---
Sorry, I misread.

[Bug libstdc++/94559] New: Nitpick: constexpr_fill test isn't constexpr

2020-04-11 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94559

Bug ID: 94559
   Summary: Nitpick: constexpr_fill test isn't constexpr
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

This test:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/testsuite/23_containers/array/requirements/constexpr_fill.cc

is named constexpr_fill, but that test is a runtime one.

[Bug c++/42633] hinting gcc that restricted pointer dont alias with members of structs

2020-03-31 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42633

Eyal Rozenberg  changed:

   What|Removed |Added

 CC||eyalroz at technion dot ac.il

--- Comment #5 from Eyal Rozenberg  ---
This bug is marked assigned, and there's a patch, but 10 years have passed.

Torben, Richard - ping.

[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed

2020-03-24 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293

--- Comment #6 from Eyal Rozenberg  ---
(In reply to Richard Biener from comment #5)
> DSE part  ... DCE

DSE = Dead Statement Elimination? DCE = Dead Code Elimination?

[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed

2020-03-23 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293

--- Comment #3 from Eyal Rozenberg  ---
(In reply to Marc Glisse from comment #1)

You should probably post that comment on the second, related, bug 94294 - which
is about the fact that GCC keeps the new and delete. This one is strictly about
the population of the string, given that new and delete are called.

[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed

2020-03-23 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293

--- Comment #2 from Eyal Rozenberg  ---
Note:

The bugs also manifest with this slightly simpler program:


#include 

int bar() {
std::string second { "Hey... no small-string optimization for me please!"
};
return 123;
}

See: https://godbolt.org/z/LjmNYi

[Bug tree-optimization/94294] [missed optimization] new+delete of unused local string not removed

2020-03-23 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294

--- Comment #1 from Eyal Rozenberg  ---
Note:

The bugs also manifest with this simpler program:


#include 

int bar() {
std::string second { "Hey... no small-string optimization for me please!"
};
return 123;
}

See: https://godbolt.org/z/LjmNYi

[Bug tree-optimization/94294] New: [missed optimization] new+delete of unused local string not removed

2020-03-23 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294

Bug ID: 94294
   Summary: [missed optimization] new+delete of unused local
string not removed
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

(Relevant Godbolt: https://godbolt.org/z/GygbjZ)

This is the second of two apparent bugs, following bug 94293. They both
manifest when compiling the following program:

#include 

int bar() {
struct poor_mans_pair {
int first;
std::string second;
};
poor_mans_pair p { 
123, "Hey... no small-string optimization for me please!" };
return p.first;
}

For x86_64, this would ideally compile into:

bar():
mov eax, 123
ret

but when compiling this  with GCC 10.0.1 20200322 (or GCC 9.x etc.), we get
assembly which calls operator new[](), populates the string, calls operator
delete[](), then returns 123:

bar():
sub rsp, 8
mov edi, 51
calloperator new(unsigned long)
movdqa  xmm0, XMMWORD PTR .LC0[rip]
mov esi, 51
mov rdi, rax
movups  XMMWORD PTR [rax], xmm0
movdqa  xmm0, XMMWORD PTR .LC1[rip]
movups  XMMWORD PTR [rax+16], xmm0
movdqa  xmm0, XMMWORD PTR .LC2[rip]
movups  XMMWORD PTR [rax+32], xmm0
mov eax, 8549
mov WORD PTR [rdi+48], ax
mov BYTE PTR [rdi+50], 0
calloperator delete(void*, unsigned long)
mov eax, 123
add rsp, 8
ret
.LC0:
.quad   7935393319309894984
.quad   3273110194895396975
.LC1:
.quad   8007513861377913971
.quad   8386118574366356592
.LC2:
.quad   2338053640980164457
.quad   8314037903514690925

This bug report is about how the allocation and de-allocation are not
elided/optimized-away, even though the std::string variable is local and
unused.

AFAICT, g++ is not required to do this. And, in fact, clang++ doesn't do this
with its libc++. cppreference says that, starting in C++14,

> New-expressions are allowed to elide or combine allocations made 
> through replaceable allocation functions. In case of elision, the
> storage may be provided by the compiler without making the call to 
> an allocation function (this also permits optimizing out unused
> new-expression)

and this is, indeed, the case of an unused new-expression. Well,
eventually-unused. 

Note: I suppose it's theoretically possible that this bug only manifests
because   bug 94293 prevents the allocated space from being recognized as
unused; but I can't tell whether that's the case.

[Bug c++/94293] New: [missed optimization] Useless statements populating local string not removed

2020-03-23 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293

Bug ID: 94293
   Summary: [missed optimization] Useless statements populating
local string not removed
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

(Relevant Godbolt: https://godbolt.org/z/GygbjZ)

This is the first of two apparent bugs manifesting when compiling the following
program:

#include 

int bar() {
struct poor_mans_pair {
int first;
std::string second;
};
poor_mans_pair p { 
123, "Hey... no small-string optimization for me please!" };
return p.first;
}

For x86_64, this would ideally compile into:

bar():
mov eax, 123
ret

but when compiling this  with GCC 10.0.1 20200322 (or GCC 9.x etc.), we get
assembly which calls operator new[](), populates the string, calls operator
delete[](), then returns 123:

bar():
sub rsp, 8
mov edi, 51
calloperator new(unsigned long)
movdqa  xmm0, XMMWORD PTR .LC0[rip]
mov esi, 51
mov rdi, rax
movups  XMMWORD PTR [rax], xmm0
movdqa  xmm0, XMMWORD PTR .LC1[rip]
movups  XMMWORD PTR [rax+16], xmm0
movdqa  xmm0, XMMWORD PTR .LC2[rip]
movups  XMMWORD PTR [rax+32], xmm0
mov eax, 8549
mov WORD PTR [rdi+48], ax
mov BYTE PTR [rdi+50], 0
calloperator delete(void*, unsigned long)
mov eax, 123
add rsp, 8
ret
.LC0:
.quad   7935393319309894984
.quad   3273110194895396975
.LC1:
.quad   8007513861377913971
.quad   8386118574366356592
.LC2:
.quad   2338053640980164457
.quad   8314037903514690925

 This bug report is about the population of the string, i.e. let's ignore the
question of whether any memory should be allocated at all.

g++ should be aware that the string has no visibility outside `bar()` (except
through access using raw arbitrary memory addresses from another while `bar()`
is executing). Also, IANALL, even if the allocation can be considered
observable behavior which needs to be maintained - values at that memory
location, which may transiently be present, do not constitute such behavior.
Why even set those values, therefore, when they are not used? At least these
string constants and population statements should be optimized away, into
something like (hand-written assembly):

bar():
sub rsp, 8
mov edi, 51
calloperator new(unsigned long)
mov rdi, rax
mov esi, 51
calloperator delete(void*, unsigned long)
mov eax, 123
add rsp, 8
ret

[Bug c++/93739] Ability to print a type name without aborting compilation

2020-02-14 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93739

--- Comment #4 from Eyal Rozenberg  ---
(In reply to Eyal Rozenberg from comment #3)
A couple more points:

* The error I get (https://godbolt.org/z/5GpR2T) doesn't have the "your type
here" string.
* This forces you to define a variable you're not using. So, you need to also
say [[unused]] in some cases - more clutter.

[Bug c++/93739] Ability to print a type name without aborting compilation

2020-02-14 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93739

--- Comment #3 from Eyal Rozenberg  ---
(In reply to Jonathan Wakely from comment #2)
> Oops, that was meant to be print_type()

Ok, that's a better kludge than mine - it doesn't have the more serious
shortcomings. That makes the motivation for this feature request more about
convenience / aesthetics. It would still be nice to have.

[Bug c++/93739] New: Ability to print a type name without aborting compilation

2020-02-14 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93739

Bug ID: 93739
   Summary: Ability to print a type name without aborting
compilation
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Over the past several years, C++ has seen increased use of type deduction via
auto variables (C++11), auto return type (C++14), template deduction guides and
more. The language already had delicate rules regarding decays, references
being added or removed, etc. All of these motivates the developer to sometimes
want to ascertain what the type of an expression or the value of a template
parameter is, in their program, while it is being compiled - with the
information printed to the standard error stream like warnings and errors are.
(And nota bene: Not at run-time).

This is partly doable today already, e.g. with the kludge in the following
example:

  using mystery  = int;

  // ... etc etc

  template struct has_type{};
  using foo = typename has_type::mystery;

this results in an error whose first line is:

:6:59: error: 'mystery' in 'struct has_type' does not name a type

and tells us the type of mystery is int. This has several drawbacks:

* Abuse of a mechanism with a different intent (although in C++ that is
sometimes considered a good idea...)
* Irrelevant clutter in the output - you need to pay attention and know what
you're looking for.
* [MOST IMPORTANT] Compilation stops when hitting this type check.

(consequences of the above)
* Can't check more than one type at a time this way.
* Can't be used in compilation log parsing - must be introduced and then
removed.
* Effectively unusable if the same expression has a different type when called
from different locations - i.e. within templates: Compilation will stop with
the first instantiation, not the one you want (and preventing this stoppage is
overkill).

I therefore ask that a feature be added to GCC, of allowing the printing of a
type's name at some appropriate point during compilation. I'm not familiar with
the various passes and stages of parsing and comprehending C++ programs in GCC,
but obviously the type is deduced at some point - the same point where the
error in the above example can be printed. Instead, I suggest for some pragma
(new or existing one) to be able to print type names.

Syntax ideas (a bike-shed issue of course):

#pragma print_type_of( mystery )
#pragma message( typeof(mystery) )

[Bug ipa/89924] [missed-optimization] Function not de-virtualized within the same TU

2019-09-27 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924

--- Comment #9 from Eyal Rozenberg  ---
(In reply to Jason Merrill from comment #8)
> I think if the object were not an actual Aint, performing the standard
> conversion to A* should be undefined, allowing the devirtualization.  But
> I'm not finding actual wording to that effect in the current draft.

I'm not sure you _should_ find such language, because it's unnecessary. A
function getting a T* is allowed to assume that the pointer is pointing to a
valid T (and if I were a language lawyer I would tell you where that's stated).
Implicitly, therefore,, a C++ program is not required to have any defined
behavior when that T* does not point to a valid T.

[Bug c++/90703] New: A virtuous bug: `=delete` accepted on second declaration

2019-06-01 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90703

Bug ID: 90703
   Summary: A virtuous bug: `=delete` accepted on second
declaration
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

(based on this SE question: https://stackoverflow.com/q/56409551/1593077
and this GodBolt test case: https://godbolt.org/z/YNstQX 
)

Consider this code:

template  int foo();
template  int foo() = delete;

it seems to be invalid in C++11 and onward:

[dcl.fct.def.delete]

4 ... A deleted definition of a function shall be the first declaration of
the function...

Un(?)fortunately, GCC accepts this code as valid C++11, beginning with 4.7.1
and all the way up to the "trunk" version that GodBolt uses. Specifically,
version 9.1 accepts it.


(Personally I feel the standard should correspond to GCC's behavior on this
matter but it's not for me to decide.)

[Bug other/90566] New: Support demangling with underscore-prefixed string after mangled name

2019-05-21 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90566

Bug ID: 90566
   Summary: Support demangling with underscore-prefixed string
after mangled name
   Product: gcc
   Version: 6.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

libiberty performs the demangling for c++filt, the most commonly-used (and
perhaps only?) tool for demangling C++ names in object files and related file
formats. One of the "ecosystems" which produces such files is CUDA;
specifically in its intermediary representation for GPU code. 

Now, a GPU-device-side function, compiled with clang to PTX, can look like
this, for example: 

  .visible .entry _Z6squarePii(
  .param .u64 _Z6squarePii_param_0,
  .param .u32 _Z6squarePii_param_1
  )
  {

  ld.param.u32%r1, [_Z6squarePii_param_1];
  mov.u32 %r2, %ctaid.x;
  setp.ge.s32 %p1, %r2, %r1;
  @%p1 braLBB6_2;
  ld.param.u64%rd2, [_Z6squarePii_param_0];
  cvta.to.global.u64  %rd3, %rd2;
  mul.wide.u32%rd4, %r2, 4;
  add.s64 %rd1, %rd3, %rd4;
  ld.global.u32   %r3, [%rd1];
  mul.lo.s32  %r4, %r3, %r3;
  st.global.u32   [%rd1], %r4;
  ret; 
  }

(see https://godbolt.org/z/GcDTVh for cland and nvcc output)

which clearly has mangled names. However, it seems the function parameter name
is somewhat malformed, or non-standard - being a mangled name, followed
immediately by an underscore and more text: mangledblahblah_param_0.

When demangling, the function name gets demangled fine, but the parameter does
not:

.visible .entry square(int*, int)(
.param .u64 _Z6squarePii_param_0,
.param .u32 _Z6squarePii_param_1
)

and from what the c++filt people say - this is libiberty's output. I ask that
libiberty either auto-detect this case, or have an option to detect it; and
when that's turned on, demangle the above into:

  .visible .entry square(int*, int)(
  .param .u64 square(int*, int)_param_0,
  .param .u32 square(int*, int)_param_1
  )

or

  .visible .entry square(int*, int)(
  .param .u64 square(int*, int) param_0,
  .param .u32 square(int*, int) param_1
  )

or something else that's meaningful.

Caveat: I realize that libiberty is FOSS and CUDA involves a bunch of
closed-source software by a company notorious for keeping code and specs
closed, and not making it easy for FOSS developers. Still, we are talking about
something clang compiles; and it's only being mindful of an underscore.

(Note: I first filed this as a bug against c++filt:
https://sourceware.org/bugzilla/show_bug.cgi?id=24557 ) and was directed to
file here.

[Bug tree-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy

2019-04-30 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271

--- Comment #10 from Eyal Rozenberg  ---
(In reply to rguent...@suse.de from comment #9)
> You'd have to experiment with different GCC versions, but yes.

I was hoping for a more concrete suggestion (which works with multiple GCC
versions)...

[Bug tree-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy

2019-04-29 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271

--- Comment #8 from Eyal Rozenberg  ---
(In reply to rguent...@suse.de from comment #5)
> int foo3()
> {
>   struct { int x; int y; } s;
>   s.x = 3;
>   char c = 1;
>   return replace_bytes_3(&s.x,c);
> }
> 
> Coalescing successful!
> Merged into 1 stores

This is very interesting! Do you think I could somehow adapt this example into
a workaround, for existing GCC versions, rather than wait for the bug fix?

[Bug tree-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy

2019-04-28 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271

--- Comment #6 from Eyal Rozenberg  ---
> Is the example from real-world code?

Yes. Example: Some machines support atomic instructions on aligned 32 bits or
on 64 bits, but not directly on 1, 2, 3, 5, 6 or 7 bytes. So in order to
atomically change a value of one of those "undesirable" sizes, you have to work
on its corresponding 4-byte or 8-byte stretch: You read it, change it in the
middle, then apply atomic compare-and-swap to it.

[Bug rtl-optimization/90271] [missed-optimization] failure to keep variables in registers during "faux" memcpy

2019-04-28 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271

--- Comment #1 from Eyal Rozenberg  ---
Can also reproduce this in C, with slightly different code:

int replace_bytes_3(int v1 ,char v2)
{
   memcpy( (void*) (((char*)&v1)+1) , &v2 , sizeof(v2) );
   return v1;
}

int foo3()
{
  int x = 3;
  char c = 1;
  return replace_bytes_3(x,c);
}


GodBolt: https://godbolt.org/z/1K89xh

Again, clang optimizes this correctly. Note specifically the way it handles the
non-inlined replace_bytes_3.

[Bug rtl-optimization/90271] New: [missed-optimization] failure to keep variables in registers during "faux" memcpy

2019-04-28 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90271

Bug ID: 90271
   Summary: [missed-optimization] failure to keep variables in
registers during "faux" memcpy
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Example on GodBolt: https://godbolt.org/z/Q17L1u

Consider the following functions:

template
inline void replace_bytes (T1& v1 ,const T2& v2 ,std::size_t k) noexcept
{
   if (k > sizeof(T1) - sizeof(T2)) { return; }

   std::memcpy( (void*) (((char*)&v1)+k) , (const void*) &v2 , sizeof(T2) );
}

For plain-old-data types, this is nothing but the manipulation of v1's bytes
(and there are no pointer aliasing issues). So, at least when k is known at
compile-time, the compiler should IMHO keep the activity to within registers.

And yet - GCC doesn't: With the extra code

int foo1()
{
  int x = 3;
  char c = 1;
  replace_bytes(x,c,1);
  return x;
}

we get (at maximum optimization):

foo1():
mov DWORD PTR [rsp-4], 3
mov BYTE PTR [rsp-3], 1
mov eax, DWORD PTR [rsp-4]
ret

This, while clang _does_ optimize fully and has foo1() simply return 259 (=
256+3).

Even if we make k a template parameter - it doesn't help.

[Bug ipa/89924] [missed-optimization] Function not de-virtualized within the same TU

2019-04-04 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924

--- Comment #5 from Eyal Rozenberg  ---
(In reply to Jan Hubicka from comment #3)
> The reason why we do not devirtualize is that only information about Aint is
> the type of function parameter 

"Only"? :-)

> and we do not believe it implies the type of
> memory location it points to because there is no read or anything from that
> pointer before it is casted to struct A* and pointer of a given type does
> not need to necessarily point to memory location of the same type unless you
> dereference it.
> 
> Is it really valid to devirtualize here?

IANALL, but yes. You're using terms like "belief" and talk about speculative
inference based on partial evidence. Why? foo_virtual gets a pointer to an
Aint. Why should the compiler needs to second-guess this?

[Bug tree-optimization/89924] New: [missed-optimization] Function not de-virtualized within the same TU

2019-04-02 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924

Bug ID: 89924
   Summary: [missed-optimization] Function not de-virtualized
within the same TU
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Related StackOverflow question: https://stackoverflow.com/q/55464578/1593077
GodBolt example: https://godbolt.org/z/l0vdFG

In the following code:

  struct A {
  virtual A& operator+=(const A& other) noexcept = 0;
  };

  void foo_inner(int *p) noexcept { *p += *p; }
  void foo_virtual_inner(A *p) noexcept { *p += *p; }

  void foo(int *p) noexcept
  {
  return foo_inner(p);
  } 

  struct Aint : public A {
  int i;
  A& operator+=(const A& other) noexcept override final
  { 
  i += dynamic_cast(other).i; 
  //  i += reinterpret_cast(other).i; 
  return *this;
  }
  };

   void foo_virtual(Aint *p) noexcept
   {
   return foo_virtual_inner(p);
   }

Both functions, `foo()` and `foo_virtual()`, should compile to the same thing.
But g++ 8.3 (on x86_64) with -O3 produces:
```
foo(int*):
sal DWORD PTR [rdi]
ret
foo_virtual(Aint*):
mov rax, QWORD PTR [rdi]
mov rax, QWORD PTR [rax]
cmp rax, OFFSET FLAT:Aint::operator+=(A const&)
jne .L19
pushrbx
xor ecx, ecx
mov edx, OFFSET FLAT:typeinfo for Aint
mov esi, OFFSET FLAT:typeinfo for A
mov rbx, rdi
call__dynamic_cast
testrax, rax
je  .L20
mov eax, DWORD PTR [rax+8]
add DWORD PTR [rbx+8], eax
pop rbx
ret
.L19:
mov rsi, rdi
jmp rax
foo_virtual(Aint*) [clone .cold.1]:
.L20:
call__cxa_bad_cast
```
i.e. it doesn't manage to de-virtualize `Aint::operator+=` - although it really
should. It has all the necessary information, as far as I can tell.

As a side note, even regardless of de-virtualization, there's a whole lot of
code there, while with with clang 8, we only get:
```
foo_virtual(Aint*):  # @foo_virtual(Aint*)
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax]
mov rsi, rdi
jmp rax 
```
which at least doesn't need the type info.

[Bug ipa/89567] [missed-optimization] Should not be initializing unused struct parameter members

2019-03-04 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89567

--- Comment #4 from Eyal Rozenberg  ---

> In the first excample, the interproceudral constant propagation pass
> (IPA-CP) found that foo1 is so small that copying all of it might be
> worth not passing the unused argument and so it does, that is why
> you'll find function foo1 twice in the assembly. 

Why does this have anything to do with constant propagation? I also don't
understand the sense in two identical copies.

It also sounds like "the wrong optimization" is being used if it's not about
noticing unused parameters.

> This functionality
> in the pass is there just "on the side" and it is not easy to make it
> also work with aggegates, not even desireable (that is the job of a
> different pass, see below).
>
> Both examples are compiled better if you make foo1 and foo2 static.

This really makes no sense to me! bar() is not affected by other TUs at all...

> In the latter case, you get exactly what you want, the structure is be
> split and only the used part survives.  In the first example, you
> don't get a clone emitted which you probably don't need.  Both of
> these transformation are done by a pass called interprocedural scalar
> replacement of aggregates (IPA-SRA), which specifically also aims to
> remove unused arguments, but it never creates multiple clones.

I like this pass :-) ... so, why does it work for the static case with bar2()
but doesn't work with bar1() ?


> I'm afraid you'd need to provide a strong real-world use-case to make
> me investigate how to make IPA-SRA clone so you might not need static
> and/or LTO because that would mean devising a cost/benefit
> (size/speedup) heuristics and that is not easy.

For now I'm just trying to understand why this isn't already happening. Then
I'll perhaps try to understand why clang does do this.

But - don't necessarily clone. IIUC,  cloning would possibly mean removing that
parameter even though it's a field of a struct. But even if you _don't_ clone,
functions calling foo() should still not have to initialize that member. It
seems like we're talking about different optimizations.

[Bug ipa/89567] [missed-optimization] Should not be initializing unused struct parameter members

2019-03-04 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89567

--- Comment #2 from Eyal Rozenberg  ---
(In reply to Richard Biener from comment #1)
> You are looking for IPA DSE 

I'm not a compiler expert and don't know what this means. Even literally, I
don't know what these acronyms stand for.

> by marshalling through a struct you make GCCs job a lot harder...  

Well, first - yes, I suppose this could make things harder. However, as GCC
does its magic, I presume that at some point the struct abstraction is lost,
and we only have code which passes values to another function in registers, and
one of these values is unused. So at some point along the way this might be
easier than analyzing structs.

> "pro-active" IPA-SRA might help here,

Again, I have no idea what that is... could I trouble you to elaborate just a
bit more?

[Bug rtl-optimization/89567] New: [missed-optimization] Should not be initializing unused struct parameter members

2019-03-03 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89567

Bug ID: 89567
   Summary: [missed-optimization] Should not be initializing
unused struct parameter members
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

The issue is captured in the example here:
https://gcc.godbolt.org/z/_U4X80

The issue was first described in this StackOverflow question:
https://stackoverflow.com/q/54964323/1593077


Consider the following code:

  __attribute__((noinline)) int foo1(int x, int y)
  {
return x;
  }

  int bar1(int* a)
  {
int b = foo1(a[5], a[10]);
return b * b;
  }

GCC (with -O3) optimizes-out the initialization of the y parameter with the
a[10] argument, saving one of the two memory reads. This is good.

Now suppose we put those two int parameters into a struct:

 struct two_ints { int x, y; };

  __attribute__((noinline)) int foo2(struct two_ints s)
  {
return s.x;
  }

  int bar2(int* a)
  {
struct two_ints ti = { a[5], a[10] };
int b = foo2(ti);
return b * b;
  }

There shouldn't be any difference, right? The parameters (certainly as far as
the assembly, which recognizes no such thing as "structs", is concerned) are
two integers; and the second one is not used. So I would expect to see the same
assembly code. Yet... I don't. Both integers are initialized and two `mov
eax, DWORD PTR [rdx+something]` instructions are executed.


This behavior also occurs also with "GCC trunk" on GodBolt, i.e. GCC version 
9.0.1.

[Bug tree-optimization/89479] __restrict on a pointer ignored when a function is passed alongside it

2019-02-26 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479

--- Comment #6 from Eyal Rozenberg  ---
Thanks to a friendly StackOverflow user, I should also report that (about) the
same code produces the same compiler behavior disparity for proper C:

https://godbolt.org/z/kVYqp8

with the slight modification being `void g(void)` instead of `void g()` in the
function signatures.

[Bug tree-optimization/89479] __restrict

2019-02-26 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479

--- Comment #5 from Eyal Rozenberg  ---
(In reply to Richard Biener from comment #4)
> exposing __restrict to the IL). 

Is "IL" an acronym for "Intermediate Language"? Remember many bug
posters/readers are not GCC developers and don't know all the lingo.

> To elaborate further to successfully mark a function call
> with clique == 1 and base == 0 we have to prove the pointer marked restrict
> doesn't escape the function through calls

Certainly, calling g() could be just the same as writing to an alias of the x
pointer. But - __restrict is how we guarantee this doesn't happen (or can be
ignored) even when the compiler can't prove that's the case on its own. So I'm
not sure I understand what you're suggesting with your comment. I suppose you
could try and "disprove the __restrict" to give a warning, but other than that
- why not just respect it?

[Bug tree-optimization/89479] __restrict

2019-02-23 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479

Eyal Rozenberg  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |---

--- Comment #2 from Eyal Rozenberg  ---
(In reply to Marc Glisse from comment #1)
> Seems similar enough.

With respect  - this is not about x being a const __restrict pointer; what I
said (including the clang behavior) applies exactly the same when we remove the
const. See: https://godbolt.org/z/hH643a (where the const is gone).

[Bug rtl-optimization/89479] New: __restrict

2019-02-23 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89479

Bug ID: 89479
   Summary: __restrict
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

(This is all illustrated at: https://godbolt.org/z/nz2YXE )

Let us make our language C++17. Consider the following function:

  int foo(const int* x, void g())
  {
  int result = *x;
  g();
  result += *x;
  return result;
  }

since we have no aliasing guarantees, we must assume the invocation of g() may
change the value at address x, so we must perform two reads from x to compute
the result - one before the call and one after.

If, however, we add __restrict__ specifier to x:

  int bar(const int* __restrict__ x, void g())
  {
  int result = *x;
  g();
  result += *x;
  return result;
  }

we may assume x "points to an unaliased integer" (as per
https://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html ). That means we
can read from address x just once, and double the value to get our result. I
realize there's a subtle point here, which is whether being "unaliased" also
applies to g()'s behavior. It is my understanding that it does.

Well, clang 7.0 understands things they way I do, and indeed optimizes one of
the reads away in `bar()`. But - g++ 8.3 (and g++ "trunk", whatever that means
on GodBolt) doesn't do so, and reads _twice_ from x both in `foo()` and in
`bar()`.

[Bug c++/88371] New: Gratuitous (?) warning regarding an implicit conversion in pointer arithmetic

2018-12-05 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88371

Bug ID: 88371
   Summary: Gratuitous (?) warning regarding an implicit
conversion in pointer arithmetic
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

See: https://godbolt.org/z/tYn9SX
for a live example and comparison with clang

Se: https://stackoverflow.com/q/53628998/1593077
for the question motivating this bug report.

---

Consider the following program:

#include 

template 
struct wrapper {
T t;
operator T() const { return t; }
T get() const { return t; }
};

int main() {
int a[10];
int* x { a } ;
wrapper y1{2};
wrapper y2{2};
wrapper y3{2};

std::cout << (x + y1) << '\n';
std::cout << (x + y2) << '\n';
std::cout << (x + y3) << '\n'; // this triggers a warning
std::cout << (x + y3.get()) << '\n';
}

When compiling it (with g++ 8.2.0) with -std=c++2a -Wsign-conversion we get:

a.cpp: In function ‘int main()’:
a.cpp:20:23: warning: conversion to ‘long int’ from ‘long unsigned int’ may
change the sign of the result [-Wsign-conversion]
 std::cout << (x + y3) << '\n'; // this triggers a warning
   ^~
As far as I can tell, both the third and fourth line should trigger a warning,
or none of them should.

Also, a comment on the Stackoverflow page suggested this clause:
http://eel.is/c++draft/over.match.oper#9

may be relevant here.

[Bug tree-optimization/87925] Missed optimization: Distinct-value if-then-else chains treated differently than switch statements

2018-11-08 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87925

--- Comment #5 from Eyal Rozenberg  ---
(In reply to Martin Liška from comment #3)
> Currently we only do switch -> balanced decision tree (read series of
> if-then-else statements). Well definitely a potentially enhancement,
> question is whether it worth doing that..

That is a question for another bug. I'm just saying that these two cases (or
some expansion thereof, e.g. with some fallthrough on the case side, or with
several distinct values in each if statement) should be treated the same. i.e.
whatever optimizations are considered for one of them should be considered for
the other.

(In reply to Martin Liška from comment #4)
> This is not problem because it's a const expression that is evaluate in C++
> front-end. Thus you don't get any penalty.

The specific use example at the link is a static_assert, but the in() function
there works with runtime values as well. So - not just compile-time
computation. Another example - un-type-erasure dispatches, which may go into
tight loops: https://stackoverflow.com/a/38924396/1593077

[Bug rtl-optimization/87925] Missed optimization: Distinct-value if-then-else chains treated differently than switch statements

2018-11-07 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87925

Eyal Rozenberg  changed:

   What|Removed |Added

Version|unknown |9.0

--- Comment #1 from Eyal Rozenberg  ---
See also: 
https://stackoverflow.com/questions/53198276/do-compilers-optimize-switches-differently-than-long-if-then-else-chains

[Bug rtl-optimization/87925] New: Missed optimization: Single-value if-then-else chains treated differently than switch'es

2018-11-07 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87925

Bug ID: 87925
   Summary: Missed optimization: Single-value if-then-else chains
treated differently than switch'es
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Have a look at this GodBolt example: https://gcc.godbolt.org/z/zR03rA

On one hand, we have:

void foo(int i) {
switch (i) {
case 1: boo<1>(); break;
case 2: boo<2>(); break;
case 3: boo<3>(); break;
case 4: boo<4>(); break;
// etc. etc.
}
}

on the other hand we have the same, but using an if-then-else chain:

void bar(int i) {
if  (i == 1) boo<1>();
else if (i == 2) boo<2>();
else if (i == 3) boo<3>();
else if (i == 4) boo<4>();
// etc. etc.
}

The switch statement gets a jump table; the if-then-else chain - does not. At
the link, there are 20 cases; g++ starts using a jump table with 4 switch
values.

This is not just a matter of programmers needing to remember to prefer switch
statements (which it's better not to require of them), but rather that
if-then-else chains are sometimes generated by expansion of templated code,
e.g. this example for checking for membership in a set of values (= all values
of an enum):

https://stackoverflow.com/a/53191264/1593077

while switch() statements of variable do not get generated AFAICT. It would
thus be quite useful if such generated code would not result in
highly-inefficient long chains of comparisons.

[Bug tree-optimization/10624] unroll-loops can't unroll nested constant loops

2018-10-08 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10624

Eyal Rozenberg  changed:

   What|Removed |Added

 CC||eyalroz at technion dot ac.il

--- Comment #8 from Eyal Rozenberg  ---
This seems to have been solved at some point. Compiling with -O3 -funroll-loops
 using GCC 8.2 on GodBolt: https://godbolt.org/z/4gBcw-

We get:

.LC0:
.string "%d, %d\n"
unroll_me:
sub rsp, 8
xor edx, edx
mov esi, 1
mov edi, OFFSET FLAT:.LC0
xor eax, eax
callprintf
xor edx, edx
mov esi, 2
xor eax, eax
mov edi, OFFSET FLAT:.LC0
callprintf
mov edx, 1
xor eax, eax
add rsp, 8
mov esi, 2
mov edi, OFFSET FLAT:.LC0
jmp printf

Which is quite unrolled.

[Bug tree-optimization/87543] Inconsistency in noticing a constant result rather than emitting code for a loop

2018-10-08 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87543

--- Comment #2 from Eyal Rozenberg  ---
(In reply to Richard Biener from comment #1)
> The issue at -O2 is etc.

That is one issue, but there is the question of the changes in behavior between
versions and when `-march` is used. I don't know if you guys are actively
maintaining 7.x or 6.x ; assuming you do, each of them should at least exhibit
coherent behavior here.

> because of the awkward IV structure PRE present us with

I assume other GCC devs will understand what this means, but for my benefit as
the lay reporter - can you define (or link to a definition of) what "PRE" is
and what is an "awkward IV structure"? (I'm guessing the acronym expands to
Induction Variable.)

[Bug c++/87543] New: Missed opportunity to compute constant return value at compile time

2018-10-06 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87543

Bug ID: 87543
   Summary: Missed opportunity to compute constant return value at
compile time
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Brief illustration on GodBolt: https://godbolt.org/z/sQyNGA
A related question on StackOverflow:
https://stackoverflow.com/q/52677512/1593077


Consider the following two functions:

#include  

int f1()
{
int arr[] = {1, 2, 3, 4, 5};
auto n = sizeof(arr)/sizeof(arr[0]);
return std::accumulate(arr,  arr + n, 0);
}

int f2()
{
int arr[] = {1, 2, 3, 4, 5};
auto n = sizeof(arr)/sizeof(arr[0]);
int sum = 0;
for(int i = 0; i < n; i++) {
sum += arr[i];
}
return sum;
}

Both functions return 15, always; and while they're not marked constexpr, this
can clearly be realized by the compiler. In fact, it is, if we compiler with
-O3 (with GCC 8.2). However, with -O2, we get the following result:

f1():
movabs  rax, 8589934593
lea rdx, [rsp-40]
mov ecx, 1
mov DWORD PTR [rsp-24], 5
mov QWORD PTR [rsp-40], rax
lea rsi, [rdx+20]
movabs  rax, 17179869187
mov QWORD PTR [rsp-32], rax
xor eax, eax
jmp .L3
.L5:
mov ecx, DWORD PTR [rdx]
.L3:
add rdx, 4
add eax, ecx
cmp rdx, rsi
jne .L5
ret
f2():
mov eax, 15
ret


I don't think `std::accumulate` should have any code which should make -O2 fail
to notice the optimization opportunity in `f1()`. But if that assertion might
be debatable, surely adding -march=skylake to the -O3 can only result in
stronger optimization, right? However, it results in _both_ functions, rather
than just `f1()`, failing to fully optimize.


I asked about part of this issue at StackOverflow and a reply (by Florian
Weimer) suggested this might be a regression relative to GCC 6.3 . And, indeed,
if we switch the GCC version to 6.3 - both functions are not-fully-optimized in
-O2, and fully-optimized with -O3:
https://godbolt.org/z/JOqCoC

if I try GCC 7.3, things get weird in yet a different way: -O2 optimizes both
functions fully, and -O3 optimizes just the _first_ one.

[Bug web/85837] Listing of all error and warning messages

2018-05-18 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837

--- Comment #7 from Eyal Rozenberg  ---
(In reply to Jonathan Wakely from comment #5)
> Be the change that you want to see in the world.
> 
> If you want this, make it happen.

Well, I already started by filing this bug, but point taken.

> (In reply to Eyal Rozenberg from comment #4)
> > Fair enough, but, honestly - if the page says "Please, feel free to suggest
> > new content in gcc-help mailing list" - practically nobody will contribute.
> 
> Why not?

Really? Ok, I'll explain: Many/most people familiar with collaboratively-edited
resources such as Wikis or Q&A sites expect either immediate ability to edit
content, or a requirement of at most website registration. What this line is
telling visitors is (with slight over-dramatization): "Don't expect to be able
to edit existing content on this page, ever. Don't expect to easily add content
to this page, ever. If you want to even add anything to this page, you have to
increase your level of commitment to that of being a mailing list member.
You'll have to talk to people on that mailing list. You'll have to convince
them your addition is important. Then maybe it'll be added." - this amounts to
telling most people to go away.

[Bug web/85837] Listing of all error and warning messages

2018-05-18 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837

--- Comment #4 from Eyal Rozenberg  ---
(In reply to Jonathan Wakely from comment #3)
> There's https://gcc.gnu.org/wiki/VerboseDiagnostics for a few such errors.

Well, that's a (tiny) start... however:

* I wouldn't have found it if you wouldn't have provided the link - and I did
search the Wiki (albeit not very thoroughly)
* I think that has low search engine visibility
* I believe there should be some auto-generated skeleton of that (either a
single page or multiple pages) which collects all error messages.
* I would definitely separate the language-specific errors for different
languages  (perhaps an even finer separation into pages is called for, but
certainly at least that)

> This absolutely should be done by users, not the GCC developers. We're all
> busy working on GCC already, and if we knew how to make the diagnostics
> easier to understand then we'd already have done it.

Fair enough, but, honestly - if the page says "Please, feel free to suggest new
content in gcc-help mailing list" - practically nobody will contribute.

Also, I'm sure that some of this could be adapted from from other sources
online.

[Bug web/85837] Listing of all error and warning messages

2018-05-18 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837

--- Comment #2 from Eyal Rozenberg  ---
(In reply to Andrew Pinski from comment #1)
> We try to improve error messages rather than list all of the error messages
> out.

But the listed error messages must balance readability/accessibility with
conciseness. Specifically, an error message will never have a short example of
a typical error and a correction. Or an explanation, in a few sentences, of a
some concept referred to by the message, or a quotation of a paragraph from the
language standard and so on.

[Bug web/85837] New: Listing of all error and warning messages

2018-05-18 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85837

Bug ID: 85837
   Summary: Listing of all error and warning messages
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: web
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Compiler error and warning messages are sometimes difficult to understand -
especially (but not exclusively) for novice developers. They are also typically
concise, and assume some knowledge of relevant terms, which the program author
may not know, despite being able to write a program.

I also note that many users repeatedly ask questions on web forums and Q&A
sites (e.g. StackOverflow) regarding specific error messages they get - not
just asking "what's wrong with my code which causes the error?", but rather
"What does this message mean? I don't understand what it says."

Now, the GCC manual does not seem include such a listing, and I could not find
it on the Wiki either. Assuming it indeed doesn't exist - I believe that it
should. If it does exist, then the bug is that it's difficult to notice/locate.

Note that to realize such a listing it should be possible to harness more than
just the GCC developers, if it's done through the Wiki. (Of course people would
need to be attracted to the Wiki to assist in doing this.)

[Bug middle-end/84083] [missed optimization] loop-invariant strlen() not hoisted out of loop

2018-01-29 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84083

--- Comment #4 from Eyal Rozenberg  ---
(In reply to Richard Biener from comment #3)
> Yes, we don't currently implement restrict disambiguation for calls.

So, would that account for the different compilation result for test1() and
test2() in the following code:

#include 

inline size_t my_strlen(const char* __restrict__ s) 
{
const char* p = s;
while(*p != '\0') { p++; }
return p - s;
}

size_t test1()
{
static const char* hw = "Hello, world!";
return my_strlen(hw);
}

size_t test2()
{
static const char* hw = "Hello, world!";
return strlen(hw);
}

where test2() compiles to just returning a fixed value while test1() executes a
loop (See https://godbolt.org/g/CvVxru) ?

[Bug rtl-optimization/84083] [missed optimization] loop-invariant strlen() not hoisted out of loop

2018-01-28 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84083

--- Comment #2 from Eyal Rozenberg  ---
(In reply to Andrew Pinski from comment #1)
> I think bar can still change the value of what ss points to.

What, you mean, by walking up the stack? I don't see why the compiler should
accommodate that by avoiding hoisting.

If you mean because ss is also somehow visible to bar(), then - the
__restrict__ guarantees we don't have to worry about that. IIANM.

[Bug rtl-optimization/84083] New: [missed optimization] loop-invariant strlen() not hoisted out of loop

2018-01-28 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84083

Bug ID: 84083
   Summary: [missed optimization] loop-invariant strlen() not
hoisted out of loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Consider the following code:

#include 

void bar(char c);

void foo(const char* __restrict__ ss) 
{
for (int i = 0; i < strlen(ss); ++i) 
{
bar(*ss);
}
}

To my understanding, the fact that ss is __restrict__ed (and the fact that it
isn't written through or that it's a const pointer) is sufficient to allow the
compiler to assume the memory accessible via ss remains constant, and thus that
strlen(ss) will return the same value.

But - that's not what happens (with GCC 7.3):

.L6:
movsx   edi, BYTE PTR [rbp+0]
mov rbx, r12
callbar(char)
.L3:
mov rdi, rbp
lea r12, [rbx+1]
callstrlen
cmp rax, rbx
ja  .L6

(obtained with https://godbolt.org/g/vdGSBe )

Now, I'm no compiler expert, so maybe there are considerations I'm ignoring,
but it seems to me the compiler should be able to hoist the heavier code up
above the loop.

Cf. https://stackoverflow.com/q/48482003/1593077

[Bug rtl-optimization/83952] [missed optimization] difference calculation for floats vs ints in a loop

2018-01-21 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83952

--- Comment #8 from Eyal Rozenberg  ---
Andrew, Marc: Sorry for the mess with the other bug. If only Bugzilla had an
"edit comment" feature I wouldn't have opened this second one.

[Bug rtl-optimization/83952] [missed optimization] difference calculation for floats vs ints in a loop

2018-01-20 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83952

--- Comment #1 from Eyal Rozenberg  ---
Also seeing this with -O3 -fno-unroll-loops -fno-tree-loop-vectorize :
https://godbolt.org/g/r2v7X8

[Bug rtl-optimization/83952] New: [missed optimization] difference calculation for floats vs ints in a loop

2018-01-20 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83952

Bug ID: 83952
   Summary: [missed optimization] difference calculation for
floats vs ints in a loop
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Created attachment 43195
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43195&action=edit
Code exemplifying the issue

Consider the following code:

template 
void foo(T* __restrict__ a)
{
int i; T val = 0;
for (i = 0; i < 100; i++) {
val = 2 * i;
a[i] = val;
}
}

template void foo(int* __restrict__ a);
template void foo(float* __restrict__ a);

(This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but
the use of C++ here is immaterial).

The int version compiles, with -O2, into:

void foo(int*):
xor eax, eax
.L2:
mov DWORD PTR [rdi], eax
add eax, 2
add rdi, 4
cmp eax, 200
jne .L2
rep ret

One would expect that the float version would compile into something similar,
except that instead of rdi we would have a floating-point register, initialized
to 0 and incremented by float 2.0 with each iteration. Instead, we get:

void foo(float*):
xor eax, eax
.L6:
pxorxmm0, xmm0
add rdi, 4
cvtsi2ssxmm0, eax
add eax, 2
movss   DWORD PTR [rdi-4], xmm0
cmp eax, 200
jne .L6
rep ret

which seems to be much slower.

Checked here: https://godbolt.org/g/t8Hvyn

[Bug rtl-optimization/83951] [missed optimization] difference calculation for floats vs ints in a loop

2018-01-20 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951

Eyal Rozenberg  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Eyal Rozenberg  ---
Whoops, some typos. Let me redo this. Sorry.

[Bug other/83951] [missed optimization] difference calculation for floats vs ints in a loop

2018-01-20 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951

--- Comment #1 from Eyal Rozenberg  ---
Created attachment 43194
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43194&action=edit
Source producing the optimized (int) and unopmitized (float) object code

[Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop

2018-01-20 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951

Bug ID: 83951
   Summary: [missed optimization] difference calculation for
floats vs ints in a loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Consider the following code:

template 
int foo(T* __restrict__ a)
{
int i; T val = 0;
for (i = 0; i < 100; i++) {
val = 2 * i;
a[i] = val;
}
}

template int foo(int* __restrict__ a);
template int foo(float* __restrict__ a);

(This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but
the use of C++ here is immaterial).

The int version compiles, with -O2, into:

foo(int*):
xor eax, eax
.L2:
mov DWORD PTR [rdi], eax
add eax, 2
add rdi, 4
cmp eax, 200
jne .L2
rep ret

One would expect that the float version would compile into something similar,
except that instead of rdi we would have a floating-point register, initialized
to 0 and incremented by float 2.0 with each iteration. Instead, we get:

int foo(float*):
xor eax, eax
.L6:
pxorxmm0, xmm0
add rdi, 4
cvtsi2ssxmm0, eax
add eax, 2
movss   DWORD PTR [rdi-4], xmm0
cmp eax, 200
jne .L6
rep ret

which seems to be much slower.

Checked here: https://godbolt.org/g/RVBNyY

[Bug tree-optimization/59970] Bogus Wuninitialized warnings at low optimization levels

2017-01-26 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59970

Eyal Rozenberg  changed:

   What|Removed |Added

 CC||eyalroz at technion dot ac.il

--- Comment #6 from Eyal Rozenberg  ---
Chiming in after having noticed this issue with GCC 5.4.0 20160609 on Linux
MInt 18.1 with Boost 1.58.0 (using lexical_cast). Quite annoying...

[Bug rtl-optimization/78963] New: Missed optimization opportunity in copies of small unaligned data

2017-01-01 Thread eyalroz at technion dot ac.il
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78963

Bug ID: 78963
   Summary: Missed optimization opportunity in copies of small
unaligned data
   Product: gcc
   Version: 6.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Preliminary notes:
* This bug report stems from a StackOverflow question I asked:
http://stackoverflow.com/q/41407257/1593077
* This bug regards the x86_64 architecture, but may apply elsewhere.
* This bug regards -O3 optimizations
* Everything described here is about the same for GCC 6.3 and 7 - whatever
version of it GodBolt uses.
* The entire bug is demonstrated here: https://godbolt.org/g/lDJSRm plus here
https://godbolt.org/g/9Y2ebd


Consider the task of copying 3-byte values from one place to another. If both
those places are in memory, it seems reasonable to do four moves, and indeed
GCC compiles this:

  #include 

  typedef struct { unsigned char data[3]; } uint24_t;

  void f(uint24_t* __restrict__ dest, uint24_t* __restrict__ src) {
memcpy(dest,src,3); }

into this (clipping the instructions for the return value): 

  f(uint24_t*, uint24_t*):
  movzx   eax, WORD PTR [rsi]
  mov WORD PTR [rdi], ax
  movzx   eax, BYTE PTR [rsi+2]
  mov BYTE PTR [rdi+2], al

If the source or the destination is a register, two mov's should suffice -
either the first two or the second two of the above. However, if I write this
(perhaps contrived, but likely demonstrative of what could happen with larger
programs, especially with multi-translation units, or when the OS gives you a
pointer to work with etc):

  #include 

  typedef struct { unsigned char data[3]; } uint24_t;

  void f(uint24_t* __restrict__ dest, uint24_t* __restrict__ src) {
memcpy(dest,src,3); }

  int main() {
uint24_t* p = (uint24_t*) 48;
unsigned x;
f((uint24_t*) &x,p);
x += 1;
f(p,(uint24_t*) &x);
return 0;
  }

The 3-byte value is "constructed" on the stack rather than in a register (first
four mov's), and then one cannot avoid using four more mov's to copy it to the
destination:

movzx   eax, WORD PTR ds:48
mov WORD PTR [rsp-4], ax
movzx   eax, BYTE PTR ds:50
mov BYTE PTR [rsp-2], al
add DWORD PTR [rsp-4], 1
movzx   eax, WORD PTR [rsp-4]
mov WORD PTR ds:48, ax
movzx   eax, BYTE PTR [rsp-2]
mov BYTE PTR ds:50, al


If we do this with 4-byte values, i.e. replace uint24_t with uint32_t, it's a
single mov both ways, and in fact it gets further optimized, so that this:

  #include 
  #include  

  void f(uint32_t* __restrict__ dest, uint32_t* __restrict__ src)
  {
memcpy(dest,src,4);
  }

 int main() {
uint32_t* p = (uint32_t*) 48;
uint32_t x;
f(&x,p);
x += 1;
f(p,&x);
return 0;
  }


is compiled into just this

add DWORD PTR ds:48, 1

Now obviously you can't expect to optimize-out _that_ much with a 3-byte value,
but 2 mov's in and 2 mov's out should be enough. Indeed, clang (since at least
3.4.1 or so) emits this for the uint24_t code:

movzx   eax, byte ptr [50]
shl eax, 16
movzx   ecx, word ptr [48]
lea eax, [rcx + rax + 1]
mov word ptr [48], ax
shr eax, 16
mov byte ptr [50], al

which has just four mov's.