[Bug c++/108737] New: Apparent miscompile of infinite loop on gcc trunk in cddce2 pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108737 Bug ID: 108737 Summary: Apparent miscompile of infinite loop on gcc trunk in cddce2 pass Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: njs at pobox dot com Target Milestone: --- There's a meme going around about strange compilation outputs caused infinite-loop-related UB, and while discussing it a friend stumbled on this weird behavior: https://godbolt.org/z/qfj1jabMW C++ code is: extern int foo(); void blah() { int x = foo(); while (1) { if(x) foo(); } } When compiled with whatever "gcc trunk" means on compiler explorer, and with -O3 (but not -O2), as C++ (but not C), then gcc reduces this to a single call to `foo()` followed by an empty infinite loop: blah(): sub rsp, 8 callfoo() .L2: jmp .L2 As far as I can tell, there's no UB here -- 'foo' being extern means it might do anything, and infinite loops with side-effects are well-defined in C++. So this seems like a straight-up optimizer bug? Poking through the pass analysis in CE, it looks like the output of "unrolljam (tree)" is still correct, but then the output of the next pass "cddce2 (tree)" has deleted everything. Somehow it concludes that the hoisted `if` can be eliminated? idgi.
[Bug libgomp/60035] [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 --- Comment #13 from Nathaniel J. Smith --- Unfortunately, AFAICT, the omp_pause_resource APIs don't actually solve the problem. They're fine and useful if you have a single piece of code that wants to use both omp and fork(). But, this was never a *huge* issue, because if you knew you were using omp then you also knew that fork() wasn't going to work, and could use some workaround. The really nasty case is when the code using omp and the code using fork() are in entirely different pieces of code, that don't know about each other (e.g., different shared libraries). That's the motivating use case for this patch. I don't see how the omp_pause_resource APIs help with this. The best you could do is to set up a pre-fork hook to call omp_pause_resource_all, but that would be equivalent to my first patch that got rejected for breaking standards-compliant programs. (In practice, the effect of this issue has been that the whole scientific python ecosystem simply avoids omp where-ever possible. That's why no one's been nagging about this patch. It still seems like a shame to me that all this work goes into the omp runtime and then it gets ruled out for so many users for such a trivial thing, but so it goes I guess.)
[Bug target/55307] libgcc's __cpu_indicator_init does not check for avx correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55307 Nathaniel J. Smith changed: What|Removed |Added CC||njs at pobox dot com --- Comment #5 from Nathaniel J. Smith --- This was fixed in #85100.
[Bug target/85100] __builtin_cpu_supports avx does not verify OS supports it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85100 --- Comment #5 from Nathaniel J. Smith --- Julian, are you able to test the patch? I don't have a reproduction setup currently...
[Bug target/85100] __builtin_cpu_supports avx does not verify OS supports it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85100 --- Comment #3 from Nathaniel J. Smith --- We're using it exactly like the docs recommend. What on earth is __builtin_cpu_supports for, if not to tell you whether you can use a given feature? If this is by design, than at the very least the docs need to make clear that a return value of 1 does not mean you can actually use the feature. Also, it would be nice if gcc provided some function that *did* answer the question of what CPU features you can use, since AFAICT this is what every current user of __builtin_cpu_supports actually wants.
[Bug target/85100] __builtin_cpu_supports avx does not verify OS supports it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85100 Nathaniel J. Smith changed: What|Removed |Added CC||njs at pobox dot com --- Comment #1 from Nathaniel J. Smith --- For context here: NumPy currently uses __builtin_cpu_supports("avx") to decide whether it can use AVX-accelerated numerical kernels. We've been getting regular bug reports from users where this __builtin_cpu_supports("avx") returned true, but then NumPy crashes with a SIGILL when it tries to use AVX. (It seems to be related to some kind of relatively common virtualization configurations.) Examples: https://github.com/numpy/numpy/issues/10787 https://github.com/numpy/numpy/issues/9532 https://github.com/numpy/numpy/issues/10330 https://github.com/numpy/numpy/issues/9534 Now that Julian finally figured it out, I guess we'll work around it by calling xgetbv ourselves: https://github.com/numpy/numpy/pull/10814 but it really seems like it would be better if __builtin_cpu_supports would check this itself.
[Bug libgomp/60035] [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 --- Comment #8 from Nathaniel J. Smith --- (In reply to larsmans from comment #7) > Phase 1? (Not familiar with the GCC dev cycle.) Sorry, meant "stage 1". GCC trunk is (IIUC) currently in RC-bug-fixes-only pre-release freeze mode. http://gcc.gnu.org/develop.html
[Bug libgomp/60035] [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 --- Comment #6 from Nathaniel J. Smith --- (In reply to larsmans from comment #4) > Nathaniel, could you apply the cosmetic changes suggested at > http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00860.html? I'd hate to see > this patch go to waste. If you look at that thread then you'll see I did resend the patch with those fixes -- I've just attached the updated patch to this bug report as well, thanks for the catch. My guess is that no-one will pay much attention to this until gcc re-enters phase 1 in any case.
[Bug libgomp/60035] [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 Nathaniel J. Smith changed: What|Removed |Added Attachment #32019|0 |1 is obsolete|| --- Comment #5 from Nathaniel J. Smith --- Created attachment 32548 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32548&action=edit patch to make openmp -> quiesce -> fork -> openmp work (updated) Updated based on feedback from Richard Henderson
[Bug libgomp/60035] [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 --- Comment #2 from Nathaniel J. Smith --- Good point -- sent. http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00813.html
[Bug libgomp/60035] New: [PATCH] make it possible to use OMP on both sides of a fork (without violating standard)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 Bug ID: 60035 Summary: [PATCH] make it possible to use OMP on both sides of a fork (without violating standard) Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: njs at pobox dot com CC: jakub at gcc dot gnu.org Created attachment 32019 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32019&action=edit patch to make openmp -> quiesce -> fork -> openmp work This is a re-open of #52303 and #58378, with more arguments, and a proposed patch that fixes the problem without violating the openmp standard. Background: Almost all scientific/numerical code delegates linear algebra operations to some optimized BLAS library. Currently, the main contenders for this library are: 1) ATLAS: free software, but uses extensive build-time configuration, which means it must be re-compiled from source by every user to achieve competitive performance. 2) MKL: proprietary, but technically excellent. 3) OpenBLAS: free software, but uses OpenMP for threading, which means that any program which does linear algebra and also expects fork() to work is screwed [1], at least when using GCC. This means that for projects like numpy, which are used in a very large range of downstream products, we are pretty much screwed too. Many of our users use fork(), for various good reasons that I can elaborate if desired, so we can't just recommend OpenBLAS in general -- ATLAS or MKL are superior for . But recompiling ATLAS is difficult, so we can't recommend that as a general solution, or provide it in pre-compiled downloads. So what we end up doing is shipping slow, unoptimized BLAS, while all the major "scientific python" distros ship MKL; and we also get constantly pressured by users to either ship binaries with MKL or with OpenBLAS built with icc; and we field a new bug report every week or two from people who use OpenBLAS without realizing it and are experiencing mysterious hangs. (Or sometimes other projects get caught in the crossfire, e.g. [2] which is someone trying to figure out why their web-app can't generate plot graphics when using the celery job queue manager.) Meanwhile people are waiting with bated breath for clang to get an openmp implementation so that they can shift their whole stack over there, solely because of this one bug. Basically the current situation is causing ongoing pain for a wide range of people and makes free software uncompetitive with proprietary software for scientific code using Python in general. But it doesn't have to be this way! In actual practice on real implementations -- regardless of what POSIX says -- it's perfectly safe to use arbitrary POSIX APIs after fork, so long as all threads are in a known, quiescent state when the fork occurs. The attached patch has essentially no impact on compliant OpenMP-using programs; in particular, and unlike the patch in #58378, it has no affect on the behavior of the parent process, and in the child process it does nothing that violates POSIX unless the user has violated POSIX first. But it makes it safe in practice to use OpenMP encapsulated within a serial library API, without mysterious breakage depending on far away parts of the program, and in particular should fix the OpenBLAS issue. Test case included in patch is by Olivier Grisel, from #58378. Patch is against current gcc svn trunk (r206297). [1] https://github.com/xianyi/OpenBLAS/issues/294 [2] https://github.com/celery/celery/issues/1842