[Bug libgomp/113627] New: Detached tasks released without call to omp_fulfill_event
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113627 Bug ID: 113627 Summary: Detached tasks released without call to omp_fulfill_event Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: schuchart at icl dot utk.edu CC: jakub at gcc dot gnu.org Target Milestone: --- Created attachment 57236 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57236=edit Pre-processed reproducer We saw a problem in a benchmark OpenMP application that executes a loop in which two tasks are created per iteration. Each pair of tasks in an iteration is chained through a dependency on an array element and the first task is being detached. We found that the second (dependent) task is executed after the dependee is executed even though the even has not been fulfilled. I'm attaching the preprocessed sources of a reproducer (that's as small as I could get, apologies if it's still too complex). If the execution is correct the program will hang because none of the events are fulfilled. If the execution is incorrect an assert will trigger because the second task is executed and the array value is not set properly (it is set by an outside entity in our benchmark before the event is released). It is important to note that the issue occurs only with more than 64 iterations when running on a single thread. Starting from 65 iterations the dependent task is executed without the event being fulfilled. If OMP_NUM_THREADS is set to 2 the crossover is 128/129 iterations. To build the example: $ gcc -g -O0 -fopenmp example_detach.c -o example To run the example (will hang due to the event not being fulfilled): $ OMP_NUM_THREADS=1 ./example -t 64 To run the example and trigger the assert because the dependent task is executed prematurely: $ OMP_NUM_THREADS=1 ./example -t 64 I'm running on an AMD Epyc Rome machine on a GNU/Linux system. I see this behavior with a system-wide gcc 12.2.0 installed through spack and a gcc 13.2.0 I built myself using this configure: $ ../configure --prefix=$INSTALLDIR --enable-languages=c --disable-multilib --with-pic --disable-bootstrap Please let me know if I can provide anything else.
[Bug jit/66594] jitted code should use -mtune=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66594 Joseph changed: What|Removed |Added CC||schuchart at icl dot utk.edu --- Comment #10 from Joseph --- The lack of target-specific optimizations is biting us quite a bit and manually specifying an architecture is not really an option, unless we duplicate the detection mechanism of GCC, which is not ideal. I am not familiar with the GCC code base and from the discussion below it's not clear what would be needed to advance this. If someone could provide some hints on what is missing and how/where it could be implemented we could probably take a stab at it. Would it be sufficient to add a macro to the header of the targets (as suggested here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66594#c6) that provide host_detect_local_cpu and ignore the ones that do not provide it? Or would it be better to hard-code calls for the architectures that provide them, like in the referenced patch but with architecture-specific pre-processor guards? We mostly care about i386 and arm/aarch64 but covering all available bases would be necessary, I guess.
[Bug target/55690] On some targets thread_fence is not a compiler barrier when memmodel != MEMMODEL_SEQ_CST
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55690 --- Comment #2 from Joseph --- Created attachment 52626 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52626=edit Reproducer I created a reproducer (see attached file or online: https://godbolt.org/z/n76K3Ejds). Note that the acquire fence does not prevent GCC 7 from loading l->b ahead of the loop. With GCC 8 and later l->b is loaded inside the loop (as it should be).