[Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors

2024-05-04 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114943

Bug ID: 114943
   Summary: X86 AVX2: inefficient code generated to convert SIMD
Vectors
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in the example below (see https://godbolt.org/z/qnfT4fE5G )
convert and covert3 produce code that looks to me inefficient w/r/t convert2
(and clang)  for target x86-64-v3

#define VECTOR_EXT(N) __attribute__((vector_size(N)))
typedef float VECTOR_EXT(16) float32x4_t;
typedef double VECTOR_EXT(32) float64x4_t;

float32x4_t f1,f2,f3,f4,f;
float64x4_t d1,d2,d3,d4,d;


void covert() {
   for (int i=0;i<4;++i) {
d1[i] = f1[i];
d2[i] = f2[i];
d3[i] = f3[i];
d4[i] = f4[i];
  }

}

void covert2() {
   for (int i=0;i<4;++i)
d1[i] = f1[i];
 for (int i=0;i<4;++i)
d2[i] = f2[i];
 for (int i=0;i<4;++i)
d3[i] = f3[i];
 for (int i=0;i<4;++i)
d4[i] = f4[i];
}



void covert3() {
  d1 = __builtin_convertvector(f1,float64x4_t);
}

[Bug target/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484

--- Comment #9 from vincenzo Innocente  ---
We observe that including xmmintrin.h the behaviour of some code,
notably abs(x), when x is float or double changes.
And this depends on the platform as  xmmintrin.h is x86_64 specific.
Yes, is 20 years that is like that and people always wandered why abs(x) was
behaving differently in different parts of the code and now asking why it
behaves differently on x86_64 and ARM.
The workaround is obvious: use std::abs.

I personally find very unconfortable that including (even through cascade)
xmmintrin.h changes the behaviour of "abs(x)" 


If everybody on GCC side is confortable with this situation we will just take
note and try to be more strict with code visual inspection.

[Bug target/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484

--- Comment #4 from vincenzo Innocente  ---
in C++ one is supposed to #include
 not 

I do not think that there is an explicit version of C++ headers for the
intrinsics that avoids the conflicts between C and C++.

[Bug c++/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484

--- Comment #2 from vincenzo Innocente  ---
*** Bug 114483 has been marked as a duplicate of this bug. ***

[Bug c++/114483] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114483

vincenzo Innocente  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from vincenzo Innocente  ---
please close this

*** This bug has been marked as a duplicate of bug 114484 ***

[Bug c++/114484] #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484

--- Comment #1 from vincenzo Innocente  ---
xmmintrin.h
includes mm_malloc.h
which 
#include 
which
using std::abs;
(among others)


see
https://godbolt.org/z/cxo65rnr9

or this excerpt from c++ -E dump
```
# 32
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/xmmintrin.h"
2 3 4


# 1
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/mm_malloc.h"
1 3 4
# 27
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/mm_malloc.h"
3 4
# 1
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/stdlib.h"
1 3 4
# 36
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/stdlib.h"
3 4
# 1
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/cstdlib"
1 3 4
# 39
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/cstdlib"
3 4

# 40
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/cstdlib"
3
# 37
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/stdlib.h"
2 3 4

using std::abort;
using std::atexit;
using std::exit;


  using std::at_quick_exit;


  using std::quick_exit;




using std::div_t;
using std::ldiv_t;

using std::abs;
using std::atof;
using std::atoi;
using std::atol;
using std::bsearch;
using std::calloc;
using std::div;
using std::free;
using std::getenv;
using std::labs;
using std::ldiv;
using std::malloc;

using std::mblen;
using std::mbstowcs;
using std::mbtowc;

using std::qsort;
using std::rand;
using std::realloc;
using std::srand;
using std::strtod;
using std::strtol;
using std::strtoul;
using std::system;

using std::wcstombs;
using std::wctomb;
# 28
"/data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/mm_malloc.h"
2 3 4
```

[Bug c++/114484] New: #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484

Bug ID: 114484
   Summary: #include   changes ::abs in std::abs
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

[Bug c++/114483] New: #include changes ::abs in std::abs

2024-03-26 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114483

Bug ID: 114483
   Summary: #include   changes ::abs in std::abs
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

[Bug tree-optimization/114363] inconsistent optimization of pow(x,2)+pow(y,2)

2024-03-16 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114363

--- Comment #4 from vincenzo Innocente  ---
Thanks Harald, I missed the point that float z = pow(double(x),2) and
float z = x*x would indeed produce exactly the same result, while in all other
cases of course not.

[Bug tree-optimization/114363] New: inconsistent optimization of pow(x,2)+pow(y,2)

2024-03-16 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114363

Bug ID: 114363
   Summary: inconsistent optimization of pow(x,2)+pow(y,2)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

while pow(x,2) is optimized in x*x   (float x)
in  pow(x,2)+pow(y,2) x and y are first promoted to double 
which I find inconsistent

see
https://godbolt.org/z/rYfoaxr89

[Bug libstdc++/112649] New: [c++23] in presence of inline functions and debug-info stacktrace reports the deepest callee

2023-11-21 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112649

Bug ID: 112649
   Summary: [c++23] in presence of inline functions and debug-info
stacktrace reports the deepest callee
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

Created attachment 56657
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56657&action=edit
a small demo demonstating the description.

feature or defect?
or missing feature in std::stacktrace...

what I find disturbing is that the "symbol name" is different for the very same
pc depending if it has been compiled with "-g" or not:
in case of debug-info it is set to the deepest callee, w/o to the outermost
caller.

maybe it is a issue for the libstd committee?

DEMO:
just compile the attached demo program and compile it with
c++ -std=c++23 stackTraceDemo.cpp -lstdc++exp -O2 -DINCLUDE='' -g
and run it
than without -g
one can also try to run gdb to compare with the demo output.
```
gdb ./a.out
b instrumentedFunc
run
where
```

Details:
in libstdc++-v3/src/c++23/stacktrace.cc
   123  bool
   124  stacktrace_entry::_Info::_M_populate(native_handle_type pc)
   125  {
   126auto cb = [](void* self, uintptr_t, const char* filename, int lineno,
   127 const char* function) -> int
   128{
   129  auto& info = *static_cast<_Info*>(self);
   130  info._M_set_desc(function);
   131  info._M_set_file(filename);
   132  if (info._M_line)
   133*info._M_line = lineno;
   134  return function != nullptr;
   135};
   136const auto state = init();
   137if (::__glibcxx_backtrace_pcinfo(state, pc, +cb, err_handler, this))
   138  return true;

according to doc __glibcxx_backtrace_pcinfo
* Given PC, a program counter in the current program, call the
   callback function with filename, line number, and function name
   information.  This will normally call the callback function exactly
   once.  However, if the PC happens to describe an inlined call, and
   the debugging information contains the necessary information, then
   this may call the callback function multiple times.  This will make
   at least one call to either CALLBACK or ERROR_CALLBACK.  This
   returns the first non-zero value returned by CALLBACK, or 0.  */

>From my tests last sentence means that if the callback does not return 0 it may
be called again.
So in the current implementation it will be called just once even in presence
of inline functions and therefore the stacktrace-entry will be set to the
deepest callee.
If one waits till last call (returning always "false") one will be able to set
the entry to 
the outermost caller or even record the full call chain (as GDB does).
This last option does not seem to fit std::backtrace interface.


--
here is the output of the demo (I prefer to print the stacktrace reversed)
# is from the stacktrace entry
>> is from __glibcxx_backtrace_pcinfo returning always "false"

[innocent@patatrack01 demos]$ c++ -std=c++23 stackTraceDemo.cpp -lstdc++exp -O2
-DINCLUDE=''; ./a.out
#0 0x  :0
#1 0x40164d _start :0
#2 0x7f4412f23d84 __libc_start_main :0
#3 0x40159a main :0
#4 0x401eeb func(int) :0
#5 0x401ab0 instrumentedFunc(int) :0
10
[innocent@patatrack01 demos]$ c++ -std=c++23 stackTraceDemo.cpp -lstdc++exp -O2
-DINCLUDE='' -g; ./a.out
#0 0x  :0
#1 0x40164d _start :0
#2 0x7ff80f90ed84 __libc_start_main :0
#3 0x40159a main
/data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:116
>> 1 main /data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:116
#4 0x401eeb nestedFunc2(int)
/data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:101
>> 1 _Z11nestedFunc2i 
>> /data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:101
>> 2 _Z10nestedFunci 
>> /data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:106
>> 3 _Z4funci /data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:112
#5 0x401ab0 instrumentedFunc(int)
/data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:91
>> 1 _Z16instrumentedFunci 
>> /data/user/innocent/MallocProfiler/demos/stackTraceDemo.cpp:91
10
[innocent@patatrack01 demos]$ gdb ./a.out
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-19.el8
...
Reading symbols from ./a.out...done.
(gdb) b instrumentedFunc
Breakpoint 1 at 0x401a90: file
/afs/cern.ch/work/i/innocent/public/w5/include/c++/14.0.0/bits/new_allocator.h,
line 88.
(gdb) run
Starting program: /data/user/innocent/MallocProfiler/demos/a.out
Breakpoint 1, instrumentedFunc (c=4) at
/afs/cern.ch/work/i/innocent/public/w5/include/c++/14.0.0/bits/new_allocator.h:88
88__new_allocator() _GLIBCXX_USE_NOEXCEPT { }
(gdb) where
#0  instrumentedFunc (c=4) at
/afs/cern.ch/work/i/innocent/public/w5/include/c++/14.0.0/bits/new_a

[Bug libstdc++/112348] [C++23] defect in struct hash>

2023-11-09 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112348

--- Comment #1 from vincenzo Innocente  ---
This patch works for me

diff --git a/libstdc++-v3/include/std/stacktrace
b/libstdc++-v3/include/std/stacktrace
index da0e48d3532..9a0d0b16068 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -797,7 +797,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   size_t
   operator()(const basic_stacktrace<_Allocator>& __st) const noexcept
   {
-   hash __h;
+   hash __h;
size_t __val = _Hash_impl::hash(__st.size());
for (const auto& __f : __st)
  __val = _Hash_impl::__hash_combine(__h(__f), __val);

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-05 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263

--- Comment #12 from vincenzo Innocente  ---
confirm that the patch solves the issue

c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINLIB -fpic -shared -o
liba.so -ldl;c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINMAIN -L. -la
-Wl,-rpath=.; ./a.out
   0# nested_func2(int) at
/data/user/innocent/MallocProfiler/tests/testStacktrace.cpp:63
   1# nested_func(int) at
/data/user/innocent/MallocProfiler/tests/testStacktrace.cpp:93
   2# func(int) at
/data/user/innocent/MallocProfiler/tests/testStacktrace.cpp:101
   3# main at /data/user/innocent/MallocProfiler/tests/testStacktrace.cpp:106
   4# __libc_start_main at :0
   5# _start at :0
   6#

what is the last empty entry is a different story I suppose (not an issue at
the moment).

Thanks again for the fast action

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-03 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263

--- Comment #8 from vincenzo Innocente  ---
Thanks Ian for the patch.
For testing I will need the full git diff (including the makefile itself as my 
autoconf is not compatible with gcc14).

Backports down to gcc12 will be appreciated.
Could you please notify here when the patch enters the various main branches?

[Bug libstdc++/112348] New: [C++23] defect in struct hash>

2023-11-02 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112348

Bug ID: 112348
   Summary: [C++23] defect in struct
hash>
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

gcc version 14.0.0 20231028 (experimental) [master r14-4988-g5d2a360f0a5] (GCC)

 auto k = std::hash()(std::stacktrace::current());

does not compile to me with error

In instantiation of 'std::size_t std::hash
>::operator()(const std::basic_stacktrace<_Allocator>&) const [with _Allocator
= std::allocator; std::size_t = long unsigned int]':
testStacktrace.cpp:39:41:   required from here
   39 |auto k = std::hash()(std::stacktrace::current());
  | ^~~~
/afs/cern.ch/work/i/innocent/public/w5/include/c++/14.0.0/stacktrace:803:49:
error: no match for call to '(std::hash) (const
std::stacktrace_entry&)'
  803 |   __val = _Hash_impl::__hash_combine(__h(__f), __val);
  |  ~~~^


changed
// hash __h;
hash __h;

and it compiled.
(I suspect __f.native_handle() would work as well)

Surprised it passed tests.

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-01 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263

--- Comment #6 from vincenzo Innocente  ---
Sorry, made the (almost) full exercise:
read the doc in 
https://en.cppreference.com/w/cpp/utility/stacktrace_entry
and the code in stacktrace header file and in
libstdc++-v3/src/c++23/stacktrace.cc
(have not read the specs in the C++23 standard)
indeed the entry implementation has just the handle as data member
and the details are retrieved when the "Query" methods are called.
This appears to happen in
stacktrace_entry::_Info::_M_populate(native_handle_type pc)
which in turn calls
::__glibcxx_backtrace_pcinfo
if this fails it calls
::__glibcxx_backtrace_syminfo

so most probably the issue is in this last function unless there is a problem
with the logic in _M_populate that I failed to identify.

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-11-01 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263

--- Comment #5 from vincenzo Innocente  ---
so if I add to
std::cout << std::stacktrace::current() << '\n';
I get what needed
   Dl_info dlinfo;
   for (auto & entry : std::stacktrace::current() ) {
 dladdr((const void*)(entry.native_handle()),&dlinfo);
 std::cout << dlinfo.dli_sname << ' ' << dlinfo.dli_fname <<'\n';
   }

 c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINLIB -fpic -shared -o
liba.so -ldl ; c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINMAIN -L.
-la -Wl,-rpath=. ; ./a.out
   0#  at :0
   1#  at :0
   2# func(int) at
/data/user/innocent/MallocProfiler/tests/testStacktrace.cpp:44
   3# main at /data/user/innocent/MallocProfiler/tests/testStacktrace.cpp:49
   4#  at :0
   5# _start at :0
   6#

_Z12nested_func2i ./liba.so
_Z11nested_funci ./liba.so

of course not de-mangled

so is it a feature or a defect?

I'm not sure how the implementation works (did not look to the code)
dladdr can be slow and may "hang" in some situations.
so it would be useful to have an option that the "name" is not immediately
resolved
and have a function that returns the name from the native_handle
"asynchronously"

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-10-31 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263

--- Comment #4 from vincenzo Innocente  ---
intel x86_64 
uname -a
Linux patatrack01 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Thu May 18 10:27:05 EDT
2023 x86_64 x86_64 x86_64 GNU/Linux

boost::backtrace works
can provide example

[Bug libbacktrace/112263] [C++23] std::stacktrace does not identify symbols in shared library

2023-10-30 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263

vincenzo Innocente  changed:

   What|Removed |Added

 CC||ian at gcc dot gnu.org
  Component|libstdc++   |libbacktrace

--- Comment #2 from vincenzo Innocente  ---
I suspect libbacktrace even if I do not have ways to test it outside
std::stacktrace

[Bug libstdc++/112263] New: [C++23] std::stacktrace does not identify symbols in shared library

2023-10-28 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112263

Bug ID: 112263
   Summary: [C++23] std::stacktrace does not identify symbols in
shared library
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

using 
gcc version 14.0.0 20231028 (experimental) [master r14-4988-g5d2a360f0a5] (GCC)
that contains the fix for #111936

This simple example  [1]
when run as a single executable prints all symbols in the stacktrace
when the nested functions are in a shared library their names are missing
c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINMAIN -DINLIB ; ./a.out
   0# nested_func2(int) at
/afs/cern.ch/user/i/innocent/public/ctest/testStacktrace.cpp:13
   1# nested_func(int) at
/afs/cern.ch/user/i/innocent/public/ctest/testStacktrace.cpp:18
   2# func(int) at
/afs/cern.ch/user/i/innocent/public/ctest/testStacktrace.cpp:26
   3# main at /afs/cern.ch/user/i/innocent/public/ctest/testStacktrace.cpp:31
   4#  at :0
   5# _start at :0
   6#


c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINLIB -fpic -shared -o
liba.so ; c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINMAIN -L. -la
-Wl,-rpath=. ; ./a.out
   0#  at :0
   1#  at :0
   2# func(int) at
/afs/cern.ch/user/i/innocent/public/ctest/testStacktrace.cpp:26
   3# main at /afs/cern.ch/user/i/innocent/public/ctest/testStacktrace.cpp:31
   4#  at :0
   5# _start at :0
   6#



[1]
cat testStacktrace.cpp
//compile and run with either
// c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINMAIN -DINLIB; ./a.out
// or
// c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINLIB -fpic -shared -o
liba.so;c++ -std=c++23 testStacktrace.cpp -lstdc++exp -g -DINMAIN -L. -la
-Wl,-rpath=.; ./a.out
//
#include 
#include 


#ifdef INLIB
int nested_func2(int c)
{
std::cout << std::stacktrace::current() << '\n';
return c + 1;
}
int nested_func(int c)
{
return nested_func2(c + 1);
}
#else
int nested_func(int c);
#endif
#ifdef INMAIN
int func(int b)
{
return nested_func(b + 1);
}

int main()
{
std::cout << func(777);
   return 0;
}
#endif

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936

--- Comment #9 from vincenzo Innocente  ---
Thanks for the second patch.
I was indeed struggling with autoconf versions (1.15 vd 1.16)


Any chance to backport to gcc12 (our current production version)?

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936

--- Comment #7 from vincenzo Innocente  ---
not explicitly in the src tree.
only run configure in the build directory.
what I need to run in the src tree?

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936

--- Comment #5 from vincenzo Innocente  ---
My bad, long time I'm not using archive libraries and forgot about the order
rule. 

The issue is indeed missing -fPIC.
Thanks for the fast action.

I applied the patch but it seems not sufficient.

If I well understood this is where the ar lib is built
ar  rc .libs/libstdc++_libbacktrace.a  std_stacktrace-atomic.o
std_stacktrace-backtrace.o std_stacktrace-dwarf.o std_stacktrace-fileline.o
std_stacktrace-posix.o std_stacktrace-sort.o std_stacktrace-simple.o std_sta
cktrace-state.o std_stacktrace-cp-demangle.o std_stacktrace-elf.o
std_stacktrace-mmapio.o std_stacktrace-mmap.o

but those are the file compiled w/o -fPIC
those with fPIC are under .libs itself...

so I did manually
```
ar rc .libs/libstdc++_libbacktrace.a .libs/*.o ../c++23/stacktrace.o

```

and then locally
c++ -O3 -pthread -fPIC -shared -std=c++23 getStacktrace.cc
/data/user/innocent/gcc_build/x86_64-pc-linux-gnu/libstdc++-v3/src/libbacktrace/.libs/libstdc++_libbacktrace.a
-g -o mallocHook.so


and runs
setenv LD_PRELOAD ./mallocHook.so ; ./a.out ; unsetenv LD_PRELOAD
asked 4 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##
asked 8 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##
asked 16 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##
asked 32 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##
asked 64 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##
asked 128 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##
asked 256 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##
asked 512 at ###std::__new_allocator::allocate(unsigned long, void
const*)#std::allocator_traits
>::allocate(std::allocator&, unsigned long)#void std::vector >::_M_realloc_insert(__gnu_cxx::__normal_iterator
> >, int const&)#std::vector >::push_back(int
const&)#go(int)#main##_start##

[Bug c++/111934] ICE internal compiler error: in discriminator_for_local_entity, at cp/mangle.cc:2065

2023-10-24 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111934

--- Comment #3 from vincenzo Innocente  ---
with
gcc version 14.0.0 20231024 (experimental) [master r14-4877-g724badcadf8] (GCC)
I get the same ICE.

Please note that one needs to include "iostream"
(in my test compile with "-DICE")
to trigger the ICE.
w/o it just emits the syntax error as one would expect.

[Bug libstdc++/111936] std::stacktrace cannot be used in a shared library

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936

--- Comment #1 from vincenzo Innocente  ---
here is a minimal malloc hook that I would like to use
[innocent@patatrack01 ctest]$ cat getStacktrace.cc
#include 

  std::string get_stacktrace() {
 std::string trace;
 for (auto & entry : std::stacktrace::current() ) trace +=
entry.description() + '#';
 return trace;
  }


#include 
#include 
#include 

extern "C"
void * myMallocHook(size_t size, void const * caller) {
  __malloc_hook = nullptr;
  auto p = malloc(size);
  std::cout << "asked " << size
<< " at " << get_stacktrace()
<< std::endl;
  __malloc_hook = myMallocHook;
  return p;
}

namespace {
struct Hook {
  Hook() {
  __malloc_hook = myMallocHook;
  }
};

  Hook hook;

}

compiled as
c++ -O3 -Wall -pthread -fPIC -shared -std=c++23 -lstdc++exp getStacktrace.cc

gives the undefined symbol

 setenv LD_PRELOAD ./a.out ; ls ; unsetenv LD_PRELOAD
ls: symbol lookup error: ./a.out: undefined symbol:
_ZNSt17__stacktrace_impl10_S_currentEPFiPvmES0_i

[Bug libstdc++/111936] New: std::stacktrace cannot be used in a shared library

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111936

Bug ID: 111936
   Summary: std::stacktrace cannot be used in a shared library
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

I would like to use std::stacktrace in a shared library to be preloaded...

when I try to build the library even for this minimal example
cat getStacktrace.cc
#include 

  std::string get_stacktrace() {
 std::string trace;
 for (auto & entry : std::stacktrace::current() ) trace +=
entry.description() + '#';
 return trace;
  }

it fails
 c++ -O3 -Wall -pthread -fPIC -shared getStacktrace.cc -std=c++23 -lstdc++exp
/usr/bin/ld:
/afs/cern.ch/work/i/innocent/public/w5/bin/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../lib64/libstdc++exp.a(std_stacktrace-fileline.o):
relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a
shared object; recompile with -fPIC
/usr/bin/ld:
/afs/cern.ch/work/i/innocent/public/w5/bin/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../lib64/libstdc++exp.a(std_stacktrace-posix.o):
relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a
shared object; recompile with -fPIC
/usr/bin/ld:
/afs/cern.ch/work/i/innocent/public/w5/bin/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../lib64/libstdc++exp.a(std_stacktrace-simple.o):
relocation R_X86_64_32 against `.text' can not be used when making a shared
object; recompile with -fPIC
/usr/bin/ld:
/afs/cern.ch/work/i/innocent/public/w5/bin/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../lib64/libstdc++exp.a(std_stacktrace-elf.o):
relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a
shared object; recompile with -fPIC
/usr/bin/ld:
/afs/cern.ch/work/i/innocent/public/w5/bin/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../lib64/libstdc++exp.a(std_stacktrace-mmap.o):
relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a
shared object; recompile with -fPIC
/usr/bin/ld:
/afs/cern.ch/work/i/innocent/public/w5/bin/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../lib64/libstdc++exp.a(std_stacktrace-mmapio.o):
relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a
shared object; recompile with -fPIC
/usr/bin/ld:
/afs/cern.ch/work/i/innocent/public/w5/bin/../lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../lib64/libstdc++exp.a(std_stacktrace-dwarf.o):
relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a
shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status


it silently compiles with
[innocent@patatrack01 ctest]$ c++ -O3 -Wall -pthread -fPIC -shared -std=c++23
-lstdc++exp getStacktrace.cc

but the symbols are undefined

[innocent@patatrack01 ctest]$ ldd ./a.out
linux-vdso.so.1 (0x7ffd50f73000)
libstdc++.so.6 => /afs/cern.ch/user/i/innocent/w5/lib64/libstdc++.so.6
(0x7fa9437f8000)
libm.so.6 => /usr/lib64/libm.so.6 (0x7fa943476000)
libgcc_s.so.1 => /afs/cern.ch/user/i/innocent/w5/lib64/libgcc_s.so.1
(0x7fa94324b000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x7fa94302b000)
libc.so.6 => /usr/lib64/libc.so.6 (0x7fa942c66000)
/lib64/ld-linux-x86-64.so.2 (0x7fa943e68000)
[innocent@patatrack01 ctest]$ nm -C ./a.out | grep stack
0db0 T get_stacktrace[abi:cxx11]()
0be0 t get_stacktrace[abi:cxx11]() [clone .cold]
0d20 t std::basic_stacktrace
>::current(std::allocator const&) [clone .isra.0]
 U std::stacktrace_entry::_Info::_M_populate(unsigned long)
1430 W std::stacktrace_entry::_Info::_S_set[abi:cxx11](void*, char
const*)
 U std::__stacktrace_impl::_S_current(int (*)(void*, unsigned
long), void*, int)
1310 W std::basic_stacktrace
>::_M_prepare(unsigned short)::{lambda(void*, unsigned long)#1}::_FUN(void*,
unsigned long)


and at run time (not this example, my full application that invoke the
staketrace from a malloc hook) it (obviously fail)

[innocent@patatrack01 ctest]$ c++ -O3 -Wall -pthread -fPIC -shared -std=c++23
-lstdc++exp mallocWrapper.cc
[innocent@patatrack01 ctest]$ setenv LD_PRELOAD ./a.out ; ls ; unsetenv
LD_PRELOAD
Recoding structure constructed in a thread
ls: symbol lookup error: ./a.out: undefined symbol:
_ZNSt17__stacktrace_impl10_S_currentEPFiPvmES0_i

[Bug c++/111934] ICE internal compiler error: in discriminator_for_local_entity, at cp/mangle.cc:2065

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111934

--- Comment #1 from vincenzo Innocente  ---
sorry missed the version

gcc version 14.0.0 20231021 (experimental) [master r14-4817-g405a4140fc3] (GCC)

[Bug c++/111934] New: ICE internal compiler error: in discriminator_for_local_entity, at cp/mangle.cc:2065

2023-10-23 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111934

Bug ID: 111934
   Summary: ICE  internal compiler error: in
discriminator_for_local_entity, at cp/mangle.cc:2065
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

#ifdef ICE
#include 
#endif

struct  Me {

  static Me & me() {
thread_local auto me = std::make_unique_ptr();
return *me;
  }

};


int main() {
  return 0;
}

c++ -O3 -Wall -pthread ice13.cpp
ice13.cpp: In static member function 'static Me& Me::me()':
ice13.cpp:8:33: error: 'make_unique_ptr' is not a member of 'std'
8 | thread_local auto me = std::make_unique_ptr();
  | ^~~
ice13.cpp:8:51: error: expected primary-expression before '>' token
8 | thread_local auto me = std::make_unique_ptr();
  |   ^
ice13.cpp:8:53: error: expected primary-expression before ')' token
8 | thread_local auto me = std::make_unique_ptr();
  | ^


==


ctest]$ c++ -O3 -Wall -pthread ice13.cpp -DICE
ice13.cpp: In static member function 'static Me& Me::me()':
ice13.cpp:8:33: error: 'make_unique_ptr' is not a member of 'std'
8 | thread_local auto me = std::make_unique_ptr();
  | ^~~
ice13.cpp:8:51: error: expected primary-expression before '>' token
8 | thread_local auto me = std::make_unique_ptr();
  |   ^
ice13.cpp:8:53: error: expected primary-expression before ')' token
8 | thread_local auto me = std::make_unique_ptr();
  | ^
ice13.cpp: At global scope:
ice13.cpp:8:23: internal compiler error: in discriminator_for_local_entity, at
cp/mangle.cc:2065
8 | thread_local auto me = std::make_unique_ptr();
  |   ^~
0x7de25d discriminator_for_local_entity
../../gcc_src/gcc/cp/mangle.cc:2065
0xb92a4a write_local_name
../../gcc_src/gcc/cp/mangle.cc:2164
0xb92a4a write_name
../../gcc_src/gcc/cp/mangle.cc:1071
0xb94e46 write_encoding
../../gcc_src/gcc/cp/mangle.cc:864
0xb94f5b write_mangled_name
../../gcc_src/gcc/cp/mangle.cc:810
0xb95740 mangle_decl_string
../../gcc_src/gcc/cp/mangle.cc:4092
0xb9592a get_mangled_id
../../gcc_src/gcc/cp/mangle.cc:4113
0xb9592a mangle_decl(tree_node*)
../../gcc_src/gcc/cp/mangle.cc:4151
0x16512bd decl_assembler_name(tree_node*)
../../gcc_src/gcc/tree.cc:715
0xe4a329 symbol_table::insert_to_assembler_name_hash(symtab_node*, bool)
../../gcc_src/gcc/symtab.cc:175
0xe4a48c symbol_table::symtab_initialize_asm_name_hash()
../../gcc_src/gcc/symtab.cc:267
0xe4ae84 symbol_table::symtab_initialize_asm_name_hash()
../../gcc_src/gcc/symtab.cc:1078
0xe4ae84 symtab_node::get_for_asmname(tree_node const*)
../../gcc_src/gcc/symtab.cc:1066
0xe5fc61 handle_alias_pairs
../../gcc_src/gcc/cgraphunit.cc:1528
0xe64fa7 symbol_table::finalize_compilation_unit()
../../gcc_src/gcc/cgraphunit.cc:2541
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does)

2023-05-17 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Bug ID: 109885
   Summary: gcc does not generate movmskps and testps instructions
 (clang does)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in this simple code (on avx2)

int sum(float const * x) {
   int ret = 0;
   for (int i=0; i<8; ++i) ret +=(0==x[i]);
   return ret;
}

int one(float const * x) {
   int ret = 0;
   for (int i=0; i<8; ++i) ret |=(0==x[i]);
   return ret;
}

int all(float const * x) {
   int ret = 1;
   for (int i=0; i<8; ++i) ret &=(0==x[i]);
   return ret;
}

clang uses movmskps and testps instructions, gcc does not

see for instance

https://godbolt.org/z/r11r8xoYz

[Bug c++/109281] New: use std::optional results in suboptimal code

2023-03-25 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109281

Bug ID: 109281
   Summary: use std::optional results in suboptimal code
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

In the following (almost real) code gcc emits suboptimal code if std::optional
is used w/r/t home made one and clang

see https://godbolt.org/z/Pba51Ye7Y


-

code


#include 

// #define USE_OPTIONAL

#ifdef USE_OPTIONAL
struct SubRingCrossings {
  SubRingCrossings(int ci, int ni, float nd) : closestIndex(ci), nextIndex(ni),
nextDistance(nd) {}

  int closestIndex;
  int nextIndex;
  float nextDistance;
};
#else
struct SubRingCrossings {
  SubRingCrossings() : valid(false) {}
  SubRingCrossings(int ci, int ni, float nd) : valid(true), closestIndex(ci),
nextIndex(ni), nextDistance(nd) {}

  bool valid;
  int closestIndex;
  int nextIndex;
  float nextDistance;
};
#endif

bool condition();

#ifdef USE_OPTIONAL
std::optional foo() {
if (condition()) {
return std::nullopt;
}
return SubRingCrossings(1, 2, 3.14);
}
#else
SubRingCrossings foo() {
if (condition()) {
return SubRingCrossings();
}
return SubRingCrossings(1, 2, 3.14);
}
#endif

int bar() {
auto tmp = foo();
#ifdef USE_OPTIONAL
if (tmp) {
return tmp->closestIndex;
#else
if (tmp.valid) {
return tmp.closestIndex;
#endif
} else {
return 0;
}
}

[Bug tree-optimization/109011] New: missed optimization in presence of __builtin_ctz

2023-03-03 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109011

Bug ID: 109011
   Summary: missed optimization in presence of __builtin_ctz
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in the following code foo does not vectorize, bar does.
clang vectorize foo using a pattern that invokes vplzcntd

(code made a bit complex to make vectorization "relevant") 

see https://godbolt.org/z/5fa1zbPeG

#include 
uint32_t x[256];
uint32_t y[256];
uint32_t w[256];
uint32_t z[256];



void foo() {
  for (int i=0; i<256;i++) {
auto p = x[i] ?  __builtin_ctz(x[i]) : y[i];
   z[i] = w[i]*p;
 }  
}


void bar() {
  for (int j=0; j<256;j+=8)
  for (int i=j; i

[Bug tree-optimization/108804] New: missed vectorization in presence of conversion from uint64_t to float

2023-02-15 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108804

Bug ID: 108804
   Summary: missed vectorization in presence of conversion from
uint64_t to float
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in the following code [1] foo does not vectorize, bar doos
compiled with -march=haswell -Ofast --no-math-errno -Wall
see
https://godbolt.org/z/E6xzfavxc

clang seems do do better

[1]
#include



uint64_t d[512];
//uint32_t f[1024];
float f[1024];

void foo() {
for (int i=0; i<512; ++i) {
uint64_t k = d[i];
auto x  = (k & 0x007F) |  0x3F80;
k = k >> 23;
auto y  = (k & 0x007F) |  0x3F80;
f[i]=x; f[128+i] = y;

}
}

void bar() {
for (int i=0; i<512; ++i) {
uint64_t k = d[i];
uint32_t x  = (k & 0x007F);
x |= 0x3F80;
uint32_t y  = k >> 23;
y  = (y & 0x007F) |  0x3F80;
f[i]=x; f[128+i] = y;

}  
}

[Bug tree-optimization/108677] wrong vectorization (when copy constructor is present?)

2023-02-06 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108677

--- Comment #3 from vincenzo Innocente  ---
sorry. the original internal bug report was for gcc 7.5
https://godbolt.org/z/9crafbqen

where I think the generated code is indeed wrong (and does not depend on the
presence of the constructor!)

SO, if anything the bug should be changed in: removing constructor inhibit SLP
vectorization?

[Bug tree-optimization/108677] New: wrong vectorization (when copy constructor is present?)

2023-02-05 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108677

Bug ID: 108677
   Summary: wrong vectorization (when copy constructor is
present?)
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in this real life code

#include

struct trig_pair {
   double CosPhi;
   double SinPhi;

   trig_pair() : CosPhi(1.), SinPhi(0.) {}
   trig_pair(const trig_pair &tp) : CosPhi(tp.CosPhi), SinPhi(tp.SinPhi) {}
   trig_pair(const double C, const double S) : CosPhi(C), SinPhi(S) {}
   trig_pair(const double phi) : CosPhi(cos(phi)), SinPhi(sin(phi)) {}

   //Return trig_pair fo angle increased by angle of tp.
   trig_pair Add(const trig_pair &tp) {
  return trig_pair(this->CosPhi*tp.CosPhi - this->SinPhi*tp.SinPhi,
   this->SinPhi*tp.CosPhi + this->CosPhi*tp.SinPhi);
   }
};

trig_pair *TrigArr;

void FillTrigArr(trig_pair tp, unsigned MaxM)
{
//Fill TrigArr with trig_pair(jp*phi)
   if (!TrigArr) return;;
   TrigArr[1] = tp;
   for (unsigned jp = 2; jp <= MaxM; ++jp) TrigArr[jp] = TrigArr[jp-1].Add(tp);
}


gcc vectorize the loop even if a dependency is present...[1]
It will not if I comment out the copy contructor...[2]


[1]
https://godbolt.org/z/vhPeh35n5

[2]
https://godbolt.org/z/YPjdYdqG8

[Bug target/106012] rsqrtps and rcpps instructions generated even if -fno-reciprocal-math specified

2022-12-20 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106012

--- Comment #6 from vincenzo Innocente  ---
just to confirm that
-OfastĀ  -fno-reciprocal-math -mno-recip
seems to inhibit all reciprocals...
https://godbolt.org/z/f4bccb9GP

[Bug c++/107933] New: std::sqrt complies in intrinsics for float even if --no-builtin is provided

2022-11-30 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107933

Bug ID: 107933
   Summary: std::sqrt complies in intrinsics for float even if
--no-builtin  is provided
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

on x86_64

float f(float x) { return std::sqrt(x);}
compiles in
sqrtss  xmm0, xmm0
even if --no-builtin is provided
double d(double x) { return std::sqrt(x);}
calls libm as well as

float  fs(float x) { return sqrtf(x);}
double ds(double x) { return sqrt(x);}


see
https://godbolt.org/z/Mhf9hv6ns

[Bug tree-optimization/106012] rsqrtps and rcpps instructiona generated even if -fno-reciprocal-math specified

2022-06-19 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106012

vincenzo Innocente  changed:

   What|Removed |Added

Summary|rsqrtss instruction |rsqrtps and rcpps
   |generated even if   |instructiona generated even
   |-mno-recip specified|if -fno-reciprocal-math
   ||specified
 Status|RESOLVED|NEW
 Resolution|WONTFIX |---

--- Comment #3 from vincenzo Innocente  ---
Thanks for the suggestion.

-fno-reciprocal-math does indeed inhibit scalar reciprocal instructions.

NOT in vectorized loop though.

see

https://godbolt.org/z/9eMb4Tjee

[Bug target/106012] New: rsqrtss instruction generated even if -mno-recip specified

2022-06-17 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106012

Bug ID: 106012
   Summary: rsqrtss instruction generated even if -mno-recip
specified
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

with option -Ofast -mno-recip rsqrtss instruction is still generated.

https://godbolt.org/z/hGxrG7xPh

inhibiting rsqrtss and rcpss is critical to obtain identical results when
running on INTEL and AMD platforms. Having to inhibit Ofast is clearly a larger
performance penalty.

[Bug tree-optimization/104950] New: GCC does not emit branchless code

2022-03-16 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950

Bug ID: 104950
   Summary: GCC does not emit branchless code
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

In this example GCC fails to emit branchless code while CLANG does.
In the actual application, measurements shows slow down up to a factor 2.
I managed to force branchless (-DBL) but the code is pretty unfriendly
godbolt link (GCC, clang, GCC -DBL 

https://godbolt.org/z/KWY1rjhhY



and here inlined

include 
const float defaultBaseResponse = 0.5;
class DForest {
public:
//based on FastForest::evaluate() and BDTree::parseTree()
DForest() {
}
float evaluate(const float* features) const;

std::vector rootIndices_;
//"node" layout: cut, index, left, right
struct Node{
float v; int i,l,r;
constexpr int eval(float const * f) const {
#ifdef BL 
  auto m = f[i] > v;
  return *((&l) + int(m));
#else
  return f[i] > v ? r : l;
#endif
}
};
std::vector nodes_;
std::vector responses_;
std::vector baseResponses_;
};

float DForest::evaluate(const float* features) const{
float sum{defaultBaseResponse + baseResponses_[0]};
for(int index : rootIndices_){
do {
index = nodes_[index].eval(features);
} while (index>0);
sum += responses_[-index];
}
return sum;
}

[Bug tree-optimization/97707] avx512 math function invoked even if -mprefer-vector-width=256 specified

2020-11-04 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97707

--- Comment #3 from vincenzo Innocente  ---
the main point in using -mprefer-vector-width=256 is to avoid clock throttling
in "mixed" workloads.
In small benchmarks like this one avx512 is faster (even on an old Silver) even
if trigger a slower clock. (and the test should be performed with the machine
fully loaded). Still if I ask  -mprefer-vector-width=256 I would like to see no
512-wide instructions to be used.

A disturbing feature is also the difference between using int or long long as
loop index.

[Bug tree-optimization/97707] New: avx12 math function invoked even if -mprefer-vector-width=256 specified

2020-11-03 Thread vincenzo.innocente at cern dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97707

Bug ID: 97707
   Summary: avx12 math function invoked even if
-mprefer-vector-width=256 specified
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

this code will invoke _ZGVeN8v_sin instead of _ZGVdN4v_sin making use of zmm
registers
#include

int main() {

  double res=0;

  for (int x=0; x<1024;x++) {
double y = x; 
res += std::sin(y);
  }


 return res > 0.5;

}

NOTE if I specify
for (long long x=0; x<1024;x++) {

it will correcty invoke _ZGVdN4v_sin (no zmm)


compiler options
-Ofast -march=skylake-avx512 -mprefer-vector-width=256