[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread ttsiodras at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

Thanassis Tsiodras  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Thanassis Tsiodras  ---
Marking as resolved.

[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #3 from Jakub Jelinek  ---
Note, I think significant speedup is in tail recursion optimization which will
be prevented even with mergeable task.  Computing fibonacci this way is not
efficient.

[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread ttsiodras at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #2 from Thanassis Tsiodras  ---
I see. I was not aware of "mergeable", TBH - thanks for pointing it out (it led
me to reading about "data environments"). 

Thanks, Jakub.

[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #1 from Jakub Jelinek  ---
Even with if(false) the implementation has to create a new data environment
etc.
if(false) just means the task will be included, i.e. the generating task will
only continue when the included task finishes and the generating thread will
execute the task.
You'd need to add mergeable clause also to let the implementation for if(false)
pretend there wasn't the task directive at all, but that is just an
optimization option that GCC doesn't use right now (would require basically
copying the region once again).
Also, there is the overhead of the taskwait that you perform unconditionally at
all levels.