[Bug c++/80242] [C++17+] "Trailing return types" with "non-type template arguments" which could be "constant expressions" produce a parsing error

2023-06-22 Thread roland at rschulz dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80242

Roland Schulz  changed:

   What|Removed |Added

 CC||roland at rschulz dot eu

--- Comment #2 from Roland Schulz  ---
Is https://stackoverflow.com/questions/76534613/ a duplicate of this?

[Bug c++/98936] [DR1734] Incorrect computation of trivially copyable for class with user-declared move assignment operator, defined as deleted

2021-11-01 Thread roland at rschulz dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98936

Roland Schulz  changed:

   What|Removed |Added

 CC||roland at rschulz dot eu

--- Comment #2 from Roland Schulz  ---
This might be a duplicate of #96288 .

[Bug c++/96288] New: [DR 1734] __is_trivial and __is_tirivil_copyable fails for deleted members

2020-07-22 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96288

Bug ID: 96288
   Summary: [DR 1734] __is_trivial and __is_tirivil_copyable fails
for deleted members
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

https://godbolt.org/z/snjof8

The resolution of 1734 requires the class:
- has at least one non-deleted copy constructor, move constructor, copy
assignment operator, or move assignment operator, and
- has a trivial, non-deleted destructor

Therefore all 4 static-asserts should pass.

Same bug in LLVM with discussion of ABI impact:
https://bugs.llvm.org/show_bug.cgi?id=39050

[Bug c++/94628] New: segfault decltype

2020-04-16 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94628

Bug ID: 94628
   Summary: segfault decltype
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

GCC 9.3 and trunk. Internal compiler error: Segmentation fault

#include 
#include 

template
using int_constant = std::integral_constant;

template
auto select(int i, F&, Args&&...args) -> 
std::common_type_t(f)(int_constant(), 
std::forward(args)...)),
   decltype(std::forward(f)(int_constant(),
std::forward(args)...))...>
{
if (i == I) return std::forward(f)(int_constant(),
std::forward(args)...);
else {
if constexpr(sizeof...(Is)>0) 
return select(i, std::forward(f),
std::forward(args)...);
}
}

int t(int i) {
return select<0, 1>(i, [](auto x){ return int(x);});
}


No problem if `decltype(args)` is replaced with `Args`. 

https://godbolt.org/z/kPpEK8

[Bug libstdc++/91371] std::bind and bind_front don't work with function with call convention

2019-08-06 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91371

--- Comment #4 from Roland Schulz  ---
Are there any known issues with the libc++ solution? Otherwise it seems like
the simpler solution than adding a builtin.

[Bug libstdc++/91371] std::bind and bind_front don't work with function with call convention

2019-08-06 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91371

--- Comment #2 from Roland Schulz  ---
Would you recommend to fix this by adding the specializations for the
alternative calling conventions to std::is_function or by switching to the
libc++ approach?

[Bug libstdc++/91371] New: std::bind and bind_front don't work with function with call convention

2019-08-05 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91371

Bug ID: 91371
   Summary: std::bind and bind_front don't work with function with
call convention
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

If a function with call convention function type attribute is passed to
bind/bind_front it fails.

Reproducer:
#include

int bar(int) __attribute__((ms_abi)); //same with fastcall, thiscall

void test() {
std::bind(bar, 5)(); //error: function returning a function
std::bind_front(bar, 5)(); //error: static assertion failed
}

Godbolt showing it works with libc++:
https://godbolt.org/z/3g7Vk7

[Bug libstdc++/87020] New: comparison operator isn't called for stateless allocator without is_always_equal for C++11/14

2018-08-19 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87020

Bug ID: 87020
   Summary: comparison operator isn't called for stateless
allocator without is_always_equal for C++11/14
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

For an empty allocator the comparison operators aren't used, unless
is_always_equal=std::false_type is present. This is the correct behavior with
C++17 but for C++11/14 is_always_equal shouldn't have any effect.

This is low impact because any well defined and state-less allocator should
always be equal. But it isn't strictly compliant. This is present since GCC 6.

Reproducer:
https://godbolt.org/z/-5IHwC

[Bug c++/83936] [feature request] Allow constexpr char* as target argument

2018-01-18 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83936

--- Comment #2 from Roland Schulz  ---
Do you mean for the target attribute or for all attributes in general? The
following example suggests that for the alloc_align attribute it works to have
the argument depend on a template argument.

template
[[gnu::alloc_align(xx)]]
void *a(int align);

void test(int align)
{
a<1>(align);
a<2>(align);
}

$ g++ test_attr.cc -Wall -Wextra -c
test_attr.cc: In substitution of 'template void* a(int) [with int xx =
2]':
test_attr.cc:8:15:   required from here
test_attr.cc:3:7: warning: alloc_align parameter outside range [-Wattributes]
 void *a(int align);

[Bug c++/83936] New: [feature request] Allow constexpr char* as target argument

2018-01-18 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83936

Bug ID: 83936
   Summary: [feature request] Allow constexpr char* as target
argument
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

It would be very useful to be able to specify a "constexpr const char*const"
string as an argument to the target attribute. This would allow to specify the
target argument for templates where the target should depend on some template
argument. Example:

struct AVX
{
static constexpr const char*const target = "avx"; 
};

template
[[gnu::target(T::target)]]
void test() {}

void f() {
test();
}

[Bug middle-end/49363] [feature request] multiple target attribute (and runtime dispatching based on cpuid)

2018-01-18 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363

Roland Schulz  changed:

   What|Removed |Added

 CC||roland at rschulz dot eu

--- Comment #25 from Roland Schulz  ---
I believe this can be closed because this has been implemented as target_clones
attribute.

[Bug c++/83875] [feature request] target_clones compatible SIMD capability/length check

2018-01-18 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875

--- Comment #8 from Roland Schulz  ---
I would suggest that:
- inside multi-versioned (target_clones/target) function it depends on the
active target
- inside a constexpr context (function/variable, your examples) or
always_inline function it depends on the caller
- otherwise returns the default target

I assume that this should result in always returning the target being used.

[Bug c++/83875] [feature request] target_clones compatible SIMD capability/length check

2018-01-17 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875

--- Comment #5 from Roland Schulz  ---
(In reply to Jakub Jelinek from comment #4)
> So we are essentially talking about a builtin like
> __builtin_cpu_{is,supports}, that instead of runtime check would query the
> target flags of the containing function (if any) or the compilation unit
> default (if outside of function).

Exactly.

[Bug c++/83875] [feature request] target_clones compatible SIMD capability/length check

2018-01-16 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875

--- Comment #3 from Roland Schulz  ---
Instead of adding (/modifying) the goal could also be achieved by allowing to
call a target+constexpr function from a target_clone function. Currently this
gives: "error: call to non-constexpr function" (example given below). Note that
calling a target+constexpr function from a target function (with target equal
or higher then highest target) already works (commented out line below). This
suggests that the compiler in this case already understands that if the target
of the caller and callee is the same that it becomes constexpr. But this
analysis doesn't seem to be applied for target_clone.

__attribute__ ((target ("default")))
static inline constexpr int foo() { return 1; }

__attribute__ ((target ("avx2")))
static inline constexpr int foo() { return 2; }

__attribute__((target_clones("avx2","default")))
//__attribute__ ((target ("avx2"))) //if this is used instead, it compiles
int test()
{
constexpr int i = foo();
return i;
}

int main()
{
return test();
}

[Bug c++/83911] New: ICE with target attribute on constructor in gimplify_expr at gimplify.c:11321

2018-01-16 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83911

Bug ID: 83911
   Summary: ICE with target attribute on constructor in
gimplify_expr at gimplify.c:11321
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---
Target: x86-64

> g++ test_target2_min.cc -c
test_target2_min.cc: In function 'SimdFloat foo()':
test_target2_min.cc:13:12: internal compiler error: in gimplify_expr, at
gimplify.c:12193
 return 1;
^
0x8d839b gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
../../gcc-7.2.0/gcc/gimplify.c:12193

test_target2_min.cc:
class SimdFloat
{
public:
__attribute__ ((target ("default")))
SimdFloat(float x) {}

__attribute__ ((target ("avx2")))
SimdFloat(float x) {}
};

SimdFloat foo()
{
return 1;
}

[Bug c++/83875] [feature request] target_clones compatible SIMD capability/length check

2018-01-16 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875

--- Comment #2 from Roland Schulz  ---
The main problem is that it already gets resolved in the preprocessor stage.
Thus if you have:

__attribute__((target_clones("avx","default")))
void foo(){
#if __AVX__
...
#endif
}

, it doesn't work. __AVX__ is set depending on default even for the avx clone.

Outside of target_clones it would have only a very minor advantage, that it
would work directly because it would be 1/0 rather than 1/undef, and thus
wouldn't require wrapping to be used in a constexpr context.

[Bug other/83876] New: [feature request] flag to force vague linkage for typeinfo and/or disable vtable anchoring

2018-01-15 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83876

Bug ID: 83876
   Summary: [feature request] flag to force vague linkage for
typeinfo and/or disable vtable anchoring
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

Vague linking is required for runtime binding [1] and RTTI to work together. If
vtable anchoring is possible, then GCC makes use of it, and doesn't use vague
linking for typeinfo. This breaks runtime binding [2]. It would be nice if GCC
could add a flag which disables vtable anchoring and/or forces vague linking
even if vtable anchoring is possible. This would enable to use runtime binding
using dlopen even if out-of-line virtual functions exist.


1)
https://stackoverflow.com/questions/29524200/how-to-do-runtime-binding-based-on-cpu-capabilities-on-linux
2)
https://stackoverflow.com/questions/48246196/virtual-exception-class-causes-dynamic-linker-error

[Bug c++/83875] New: [feature request] target_clones compatible SIMD capability/length check

2018-01-15 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875

Bug ID: 83875
   Summary: [feature request] target_clones compatible SIMD
capability/length check
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

Currently there are two methods to check for SIMD capabilities (e.g. AVX). The
preprocessor defines (e.g. __AVX__) and the builtin __builtin_cpu_supports. The
problem is that neither can be used together with target_clones. One happens
too early (preprocessor) and one too late (runtime). I suggest to add a builtin
function which is constexpr and returns the CPU capabilities of the CPU target
and otherwise works like the existing __builtin_cpu_supports. A possible name
would be __builtin_target_supports.

Even outside of target_clones function, such a builtin would have the advantage
that it would allow replacing preprocessor #if with constexpr if.

Prior to filing the feature request I asked on gcc-help for a solution:
https://gcc.gnu.org/ml/gcc-help/2018-01/msg00057.html . No one had an idea for
a solution with current capabilities, and one other developer voiced interest
in such a feature.

[Bug c++/81957] New: ICE decltype

2017-08-23 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81957

Bug ID: 81957
   Summary: ICE decltype
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu
  Target Milestone: ---

#include 

struct f {
template
void operator()(std::integral_constant<bool,b>, Int i) {
}
};

template<bool...Bs, typename F, typename ...T>
auto dispatch(F f, T...t) -> decltype(f(std::integral_constant<bool,Bs>()...,
t...)) {
return f(std::integral_constant<bool,Bs>()..., t...);
}

template<bool...Bs, typename F, typename ...T>
auto dispatch(F f, bool b, T...t) -> decltype(dispatch<Bs..., true>(f, t...)) {
if (b)
return dispatch<Bs..., true>(f, t...);
else
return dispatch<Bs..., false>(f, t...);
}

int main() {
dispatch(f(), true, 5);
return 0;
}

gives:
Internal compiler error: Error reporting routines re-entered.

Without the 2nd decltype (either using void or no return type specification) it
compiles fine.

The problem is present in all supported GCC versions.

[Bug target/54412] Request for 32-byte stack alignment with -mavx on Windows

2015-09-23 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412

--- Comment #13 from Roland Schulz  ---
But this problem is limited to GCC. ICC, Clang and MSVC don't have the problem
with compiling 64bit AVX code. Thus they must have some kind of work-around for
ABI and GCC should be able to use a work-around too (at least in theory).


[Bug target/54412] Request for 32-byte stack alignment with -mavx on Windows

2014-09-20 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412

--- Comment #10 from Roland Schulz roland at rschulz dot eu ---
Created attachment 33520
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33520action=edit
Slightly modified testcase

This slightly modified testcase in which the return value isn't stored, still
segfaults for me. With the 32bit mingw64 binary ((i686-win32-dwarf-rev1, Built
by MinGW-W64 project) 4.9.1) it is OK, but with the 64bit binary
((x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.9.1) it segfaults.


[Bug target/54412] Request for 32-byte stack alignment with -mavx on Windows

2014-09-04 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412

--- Comment #7 from Roland Schulz roland at rschulz dot eu ---
For me the problem isn't fixed with gcc 4.9.1. I tried two build a)
http://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win32/Personal%20Builds/mingw-builds/installer/mingw-w64-install.exe/download
and b) http://nuwen.net/mingw.html. Did you use a special distribution or
special flags if you compiled gcc yourself?


[Bug target/54412] Request for 32-byte stack alignment with -mavx on Windows

2014-09-03 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412

--- Comment #5 from Roland Schulz roland at rschulz dot eu ---
This seems to me to be a duplicate of 49001.


[Bug target/61730] Cygwin AVX __m256i return value misaligned

2014-09-03 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61730

Roland Schulz roland at rschulz dot eu changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Roland Schulz roland at rschulz dot eu ---
Duplicate

*** This bug has been marked as a duplicate of bug 49001 ***


[Bug target/49001] GCC uses VMOVAPS/PD AVX instructions to access stack variables that are not 32-byte aligned

2014-09-03 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49001

--- Comment #4 from Roland Schulz roland at rschulz dot eu ---
*** Bug 61730 has been marked as a duplicate of this bug. ***


[Bug target/61730] Cygwin AVX __m256i return value misaligned

2014-09-01 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61730

--- Comment #1 from Roland Schulz roland at rschulz dot eu ---
It is probably a duplicate of 54412 or 49001 (which seem duplicates of each
other). The bugs I found previous where about 16-byte alignment and that has
been fixed. But the 32-byte alignment required for AVX doesn't seem to be
supported under Windows (both under Cygwin and Mingw).


[Bug sanitizer/55561] TSAN: provide a TSAN instrumented libgomp

2014-07-09 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55561

--- Comment #50 from Roland Schulz roland at rschulz dot eu ---
I must say I don't know how the internals work. But I assume that reductions
are implemented in libgomp (I know they are in iomp). Thus for any code which
uses OpenMP reduce statements, libgomp would touch user data.


[Bug target/61730] New: Cygwin AVX __m256i return value misaligned

2014-07-06 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61730

Bug ID: 61730
   Summary: Cygwin AVX __m256i return value misaligned
   Product: gcc
   Version: 4.8.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roland at rschulz dot eu

Created attachment 33079
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33079action=edit
Testcase which segfaults on Cygwin because of incorrect alignment

The attached testcase segfault compiled with gcc 4.8.3 on Cygwin. It is fine
when runcompiled on Linux and MingW. The debugger shows that it segfaults on
vmovdqa generated for the return of type __m256i, because memory isn't aligned.
Compiled with: g++ -mavx test2.cc.i  -g

Possible duplicates (reason why I think it probably isn't in parentheses): 
16890 (this is suppose to be fixed, so I don't think it is a duplicate)
33774 (the subject says Cygwin/mingw but the text only mentions mingw - and
this is only Cygwin)


[Bug sanitizer/55561] TSAN: provide a TSAN instrumented libgomp

2014-05-14 Thread roland at rschulz dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55561

--- Comment #47 from Roland Schulz roland at rschulz dot eu ---
Using 4.9 and --disable-linux-futex I don't get any false positives. Thus the
problem I saw with 4.8.2 is indeed fixed with 4.9. Thanks!

What is the advantage of a TSAN instrumented libgomp over one with
--disable-linux-futex?


[Bug sanitizer/55561] TSAN: provide a TSAN instrumented libgomp

2014-05-08 Thread roland at rschulz dot eu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55561

Roland Schulz roland at rschulz dot eu changed:

   What|Removed |Added

 CC||roland at rschulz dot eu

--- Comment #44 from Roland Schulz roland at rschulz dot eu ---
If I run tsan on our code with libgomp compiled with --disable-linux-futex, I
only see false positives for omp-atomic constructs. Everything else seems fine.
If I compile libgomp with tsan and without --disable-linux-futex I get a lot of
false positives. And if I compile libgomp with both tsan and
--disable-linux-futex, I get also the false positives for omp-atomic. I used
gcc 4.8.2.

For those who reported success with compiling libgomp with tsan:
- Do you also use --disable-linux-futex or did you only use -fsanitize=thread?
- Did you test with code using #pragma omp atomic update?

Is there a way to compile libgomp to not get false positives for omp-atomic?


[Bug gcov-profile/47618] Collecting multiple profiles and using all for PGO

2012-07-24 Thread roland at rschulz dot eu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47618

--- Comment #9 from Roland Schulz roland at rschulz dot eu 2012-07-24 
23:52:41 UTC ---
I think a tool to merge would be a good partial solution.

As far as I can see what would still be missing for user-friendly usage, is a
mechanism to guarantee that all pre-merged files are saved with different
names, so that different processes don't overwrite each others output files. In
the case of MPI one would want to have the mpi rank as part of the output
folder to guarantee unique file names. Thus my suggestion to support
-fprofile-dir /some/path/%q{SOME_ENV}, where SOME_ENV would be the environment
variable containing the mpi rank. Without being able to make the output path
depending on a environment variable one would be required to write some wrapper
scripts and that might not even be possible in all cases.


[Bug gcov-profile/47618] Collecting multiple profiles and using all for PGO

2012-07-24 Thread roland at rschulz dot eu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47618

--- Comment #11 from Roland Schulz roland at rschulz dot eu 2012-07-25 
00:50:30 UTC ---
Steven wrote that they are not merged but that race conditions occur. That is
also what I observed. To clarify: Message Passing Interface (MPI) is a
parallelization method which executes the same binary multiple times in
parallel (with support for messages for communication). Allowing to merge the
output into one file at runtime would require file-locking (often over network
file-systems) and would not scale because MPI applications are often used with
more than 1 (or even 1M) parallel processes simultaneous.


[Bug gcov-profile/47618] New: Collecting multiple profiles and using all for PGO

2011-02-05 Thread roland at rschulz dot eu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47618

   Summary: Collecting multiple profiles and using all for PGO
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: gcov-profile
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: rol...@rschulz.eu


Currently only the file from one profiling run can be used for PGO. Especially
for MPI programs it would be nice if several folders containing profiling files
could be merged or several directories could be used together for
-fprofile-use.

For saving the profiling files it would be great if the folder name could
contain an environment variable or could be set by an environment variable.

Thus I suggest that one could either say:
-fprofile-dir /some/path/%q{SOME_ENV}  #same syntax as valgrind
or
export GCC_PROFILE_DIR=/some/path/$SOME_ENV

This would be very useful because MPI implementation provide the MPI rank as a
environment variable. Thus with the suggestion one could store the profile of
each MPI rank in a different folder.